The MASS package is required to run the CPASSOC. Unfortunately this clashes with the dplyrselect(). So be prepared to use dplyr::select() to get some things to work if you’re adapting the code for your own use.
Show the code
library(tidyverse) # tidy coding, ggplot etclibrary(data.table) # for the rleid functionlibrary(glue) # for coding within stringslibrary(bigsnpr) # to install: devtools::install_github("privefl/bigsnpr")#library(pander) # for slick simple tableslibrary(kableExtra) # for medium sized tableslibrary(DT) # for large, searchable tableslibrary(brms) # for bayesian modelslibrary(tidybayes) # for bayesian plotslibrary(ggtext) # for markdown syntax in ggplotlibrary(ggnewscale) # to reset colour scaleslibrary(MetBrewer) # for more colour paletteslibrary(MoMAColors) # nicer colours once againlibrary(PNWColors) # even more colours#library(hexbin) # for density heat maps#library(rcartocolor) # even more nice colourslibrary(patchwork) # for combining plots#library(ggrepel) # for labelling ggplotslibrary(pheatmap) # for heat mapslibrary(MASS) # needed for CPASSOClibrary(Matrix) # needed for CPASSOC#library(flexsurv) # for survival analysis#library(rptR) # for finding the intraclass correlation coefficient# build a helper function that produces a table to display our data# Create a function to build HTML searchable tablesmy_data_table <-function(df){datatable( df, rownames=FALSE,autoHideNavigation =TRUE,extensions =c("Scroller", "Buttons"),options =list(autoWidth =TRUE,dom ='Bfrtip',deferRender=TRUE,scrollX=TRUE, scrollY=1000,scrollCollapse=TRUE,buttons =list('pageLength', 'colvis', 'csv', list(extend ='pdf',pageSize ='A4',orientation ='landscape',filename ='Lifespan_data')),pageLength =100 ) )}
Load variant/gene annotations
DGRP variant annotations were downloaded from the DGRP website and gene annotations for all the genes covered by DGRP variants, from the org.Dm.eg.db database object from Bioconductor.
These will be useful later when we aim to identify whether variants with notable associations with a trait overlap with any genes.
Show the code
# Helper function to split a vector into chunks chunker <-function(x, max_chunk_size) split(x, ceiling(seq_along(x) / max_chunk_size))if(!file.exists("data/derived/annotations.csv")){# Load annotation file, get important info annot <-read.table("data/input/dgrp.fb557.annot.txt", header =FALSE, stringsAsFactors =FALSE) get.info <-function(rows){lapply(rows, function(row){ site.class.field <-strsplit(annot$V3[row], split ="]")[[1]][1] num.genes <-str_count(site.class.field, ";") +1 output <-cbind(rep(annot$V1[row], num.genes), do.call("rbind", lapply(strsplit(site.class.field, split =";")[[1]], function(x) strsplit(x, split ="[|]")[[1]])))if(ncol(output) ==5) return(output[,c(1,2,4,5)]) # only return SNPs that have some annotation. Don't get the gene symbolelsereturn(NULL) }) %>%do.call("rbind", .) } variant.details <-lapply(chunker(1:nrow(annot), max_chunk_size =10000), get.info) %>%do.call("rbind", .) %>%as.data.frame()names(variant.details) <-c("SNP", "FBID", "site.class", "distance.to.gene") variant.details$FBID <-unlist(str_extract_all(variant.details$FBID, "FBgn[:digit:]+")) # clean up text strings for Flybase ID variant.details %>% dplyr::filter(site.class !="FBgn0003638") %>%# NB this is a bug in the DGRP's annotation filemutate(chr =str_remove_all(substr(SNP, 1, 2), "_")) # get chromosome now for faster sorting later annotations <- variant.details} else annotations <-read_csv("data/derived/annotations.csv")annotations <- annotations %>%left_join(read.csv("data/Input/all_dmel_genes.csv")) %>% dplyr::select(SNP, FBID, site.class, distance.to.gene, gene_name, chromosome)
In the demographic component of this study, we calculated mean values and standard error for each combination of line, sex, study, temperature and mating status. These data are displayed, and can be downloaded from the below table. Note that for quantitative genetic, GWA and other SNP based analysis, we removed lines that had not been genotyped (shown as Genotyped = NO). Lines with unknown genotypes also have unknown Wolbachia and inversions status’. Durham et al (2014) cleared all lines of Wolbachia via treatment with tetracycline.
Show the code
genotyped_lines <-read_csv("data/input/Genotyped_lines.csv") %>%mutate(Genotyped ="YES",line =as.factor(line))full_dataset <-read.csv("data/input/lifespan_data.csv") %>%as_tibble() %>%mutate(line =as.factor(Line),Treatment =str_replace(Treatment, " ", "_"),Treatment =case_when(Temperature ==18& Study =="Huang_2020"~"Huang_2020_1", Temperature ==25& Study =="Huang_2020"~"Huang_2020_2", Temperature ==28& Study =="Huang_2020"~"Huang_2020_3",.default = Treatment)) %>% dplyr::select(-Line) %>%left_join(genotyped_lines, by ="line") %>%mutate(Genotyped =if_else(is.na(Genotyped), "NO", Genotyped)) %>% dplyr::select(line, Sex, Temperature, Mated, Study, Treatment, Block, e0, SE_e0, h, SE_h, samp, Genotyped)# DGRP studies often correct for the most common inversions and wolbachia presence. inversions_wolbachia <-read_csv("data/Input/inversions_wolbachia.csv") %>%mutate(line =as.factor(str_remove(line, "DGRP_")),Wolbachia =if_else(Wolbachia =="y", 1, 0),across(2:17, ~case_when(.x =="ST"~0, .x =="INV/ST"~1, .x =="INV"~2))) %>% dplyr::select(line, `In(2L)t`, `In(2R)NS`, `In(3R)P`, `In(3R)K`, `In(3R)Mo`, Wolbachia) %>%rename(In_2L_t =`In(2L)t`,In_2R_NS =`In(2R)NS`,In_3R_P =`In(3R)P`,In_3R_K =`In(3R)K`,In_3R_Mo =`In(3R)Mo`)# inversions pruned to those Huang et al 2015 PNAS corrected forfull_dataset <- full_dataset %>%left_join(inversions_wolbachia) %>%mutate(Wolbachia =if_else(Study =="Durham_2014", 0, Wolbachia)) # study cleared wolbachia with tetracycline before phenotyping my_data_table(full_dataset %>%mutate(across(8:11, ~round(.x, 2))) %>% dplyr::select(1:13))
\(\mathrm{CV}_G\)
The coefficient of genetic variation is
\[\mathrm{CV}_G = \frac{100\sqrt{\sigma^2_G}}{\overline{x}}\] where \(\sigma^2_G\) is the genetic variance in the trait of interest and \(\overline{x}\) is the mean trait value. This metric allows comparison of genetic variances between traits expressed on different scales. We use it here to get a compare the extent genetic variation in life expectancy and lifespan equality.
First, let’s calculate the \(\mathrm{CV}_G\) in life expectancy using individual-level data. While we’re at it, we can also calculate broad-sense heritability.
\(\mathrm{CV}_G\) in life expectancy calculated from individual level data is \(\approx\)\(\mathrm{CV}_G\) calculated from line means. We therefore treat these line mean estimates as reasonable and use them to compare life expectancy and lifespan equality.
Table SX. \(CV_G\) estimates for life expectancy and lifespan equality, estimated from genotype means.
The preparation of data for univariate GWAS generally follows Holman and Wong’s (2023) DGRP GWAS of fitness in different contexts. See their associated workflowrreport for a comprehensive breakdown of their data preparation.
Loading data used in GWA tests
For GWAS and later CPASSOC, we split the data by study, removed studies that phenotyped < 100 lines and adjusted line means to account for experimental block where applicable. Importantly, we also split the Wilson et al (2020) data by dietary treatment; while we do not explicitly consider diet in our analysis, lifespan in one dietary treatment is considered a separate trait from lifespan measured in a second dietary treatment.
Show the code
Arya_2010_f <- full_dataset %>%filter(Study =="Arya_2010"& Sex =="Female"& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Arya_2010_m <- full_dataset %>%filter(Study =="Arya_2010"& Sex =="Male"& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_f_18 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Female"& Temperature ==18& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_m_18 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Male"& Temperature ==18& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_f_25 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Female"& Temperature ==25& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_m_25 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Male"& Temperature ==25& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_f_28 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Female"& Temperature ==28& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_m_28 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Male"& Temperature ==28& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)# In this study, some lines were measured twice per treatment, and a small subset were measured three times. We take the mean across blocks as the line mean, following the original study.Wilson_2020_f_1 <- full_dataset %>%filter(Treatment =="Wilson_2020_1"& Genotyped =="YES") %>%group_by(line) %>%mutate(e0 =mean(e0),h =mean(h)) %>%ungroup() %>%distinct(line, .keep_all =TRUE) %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Wilson_2020_f_2 <- full_dataset %>%filter(Treatment =="Wilson_2020_2"& Genotyped =="YES") %>%group_by(line) %>%mutate(e0 =mean(e0),h =mean(h)) %>%ungroup() %>%distinct(line, .keep_all =TRUE) %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)# In this study, each line was measured three times. We take the mean across blocks as the line meanDurham_2014_f <- full_dataset %>%filter(Study =="Durham_2014"& Genotyped =="YES") %>%group_by(line) %>%mutate(e0 =mean(e0),h =mean(h)) %>%ungroup() %>%distinct(line, .keep_all =TRUE) %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Patel_2021_f <- full_dataset %>%filter(Study =="Patel_2021"& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)
Install neccessary software and build helper functions
In addition to the R packages we load, plink 1.9 and beagle must also be installed. These software packages allow us to wrangle the data into the correct format and impute missing values, both of which are required to run GWAS with the plink.
plink is run from the terminal, but we pass the terminal command to R first, which then writes to the terminal. This makes our analysis reproducible. However, Windows and mac operating systems liase with the terminal differently, meaning different functions are required depending on your operating system. To make this easy we include the following code chunk, where you can specify whether you’re a windows or mac user.
Show the code
Operating_system <-"mac"# change this to "windows" if appropriate. Note that all downstream functions are informed by this
Show the code
# build functions to prepare data for GWASprep_for_e0_GWAS <-function(data, sex){data %>%mutate(line =glue("line{line}")) %>% dplyr::select(line, e0)}prep_for_h_GWAS <-function(data, sex){data %>%mutate(line =glue("line{line}")) %>% dplyr::select(line, h)}prep_for_ageing_GWAS <-function(data){ data %>%mutate(line =glue("line{line}")) %>% dplyr::select(line, ageing_axis_centered)}prep_for_baseline_mortality_GWAS <-function(data){ data %>%mutate(line =glue("line{line}")) %>% dplyr::select(line, baseline_mortality_axis_centered)}# I used bigsnpr::download_plink(dir = "code/windows") which puts it in the code folder - I'm using a windows operating system. The macOS version can also be downloaded into "code/macOS" # Beagle is a software package for phasing genotypes and imputing ungenotyped markers.if(Operating_system =="mac"){plink <-paste(getwd(), "code/macOS/plink", sep ="/")}if(Operating_system =="windows"){plink <-paste(getwd(), "code/windows/plink", sep ="/")}# only need to download this once - change path depending on operating system#beagle <- bigsnpr::download_beagle(# dir = "/Users/tkeaney/Library/CloudStorage/OneDrive-JGU/Postdoc/DGRP_lifespan/DGRP_lifespan_inequality/code/macOS") # helper function to pass commands to the terminal# Note that we set `intern = TRUE`, and pass the result of `system()` to `cat()`, # ensuring that the Terminal output will be printed in this knitr report.# # This is the mac OS functionif(Operating_system =="mac"){ run_command_mac <-function(shell_command, wd =getwd(), path =""){cat(system(glue("cd ", wd, path, # tell terminal which directory to work in"\n",shell_command), # on a second terminal line, run the plink commandintern =TRUE), sep ='\n') }}# This is the windows function if(Operating_system =="windows"){ run_command_windows <-function(plink_command, wd =getwd(), path ="") {# Specify the full path to the plink executable within the 'code' subdirectory. plink_path <-paste(getwd(), "code/windows/plink", sep ="/")# Create the full shell command with the plink executable. command <-glue("cmd.exe /c cd /d {shQuote(file.path(wd, path))} && {shQuote(plink_path)} {plink_command}")# Execute the combined command. result <-system(command, intern =TRUE)# Print the result.cat(result, sep ='\n')# Return the result as a character vector.return(result) }# sometimes we need to run terminal commands without plink - for windows, create a slightly different function to do this run_command_no_plink <-function(shell_command, wd =getwd(), path ="") {# Create the full shell command with the plink executable. command <-glue("cmd.exe /c cd /d {shQuote(file.path(wd, path))} && {shell_command}")# Execute the combined command. result <-system(command, intern =TRUE)# Print the result.cat(result, sep ='\n')# Return the result as a character vector.return(result) }}
Perform SNP quality control and impute missing data
Plink recognises three types of files that are necessary for GWA analysis: the .bed, .bim and .fam files.
.bed: the binary biallelic genotype table. Four options are possible:
00 = homozygous for minor allele
01 = missing genotype
10 = heterozygous genotype
11 = homozygous for major allele
The overwhelming majority of genotypes in the DGRP are homozygous for one of the alleles (i.e. 00 or 11).
.bim: extended variant information file accompanying the .bed file. It has six fields:
chromosome code
variant identifier
position in morgans
base-pair coordinate
Minor allele
Major allele
.fam: Plink sample information file. It can have the following elements:
Family ID (‘FID’) (in our case just the DGRP line)
Within-family ID (‘IID’; cannot be ‘0’) (in our case just the DGRP line)
Within-family ID of father (‘0’ if father isn’t in dataset)
Within-family ID of mother (‘0’ if mother isn’t in dataset)
Sex code (‘1’ = male, ‘2’ = female, ‘0’ = unknown) - not important for us because we analyse the sexes separately.
Phenotype value (‘1’ = control, ‘2’ = case, ‘-9’/‘0’/non-numeric = missing data): -9 for us because we supply the phenotypic data later.
We cleaned up the DGRP’s .bed/.bim/.fam files (available from the Mackay lab website) by:
Removing any SNPs for which genotypes are missing in >10% of the 205 DGRP lines. We then use the software Beagle to impute the remaining missing genotypes. Imputation takes about half an hour, so ideally you only want to do it once.
Removing SNPs with a minor allele frequency of less than 5% across the 205 lines. We have negligible power to detect associations for rare SNPs that occur at frequencies below this threshold.
In the plink-formatted genotype files, lines fixed for the major allele are coded as 2, and lines fixed for the minor allele as 0. SNPs with negative \(\beta\) coefficients therefore indicate that the minor allele is associated with higher trait values, while positive effect sizes means that the minor allele is associated with lower trait values.
Show the code
Run_function <-FALSE# Change this to TRUE to run - read through the code before you do this if(Run_function){# Use Plink to clean and subset the DGRP's SNP data as follows:# Only keep SNPs for which at least 90% of the 205 DGRP lines were successfully genotyped (--geno 0.1)# Only keep SNPs with a minor allele frequency of 0.05 or higher (--maf 0.05) across the 205 lines# Write the processed BIM/BED/FAM files to the data/Derived/plink_output directory output_directory <-paste(getwd(), "data/Derived/plink_output", sep ="/")if(Operating_system =="windows"){run_command_windows(glue("--bfile dgrp2"," --geno 0.1 --maf 0.05 --allow-no-sex", " --make-bed --out {shQuote(output_directory)}/dgrp2_QC_all_lines"), path ="data/Input/bfiles/") }if(Operating_system =="mac"){run_command_mac(glue("{plink} --bfile dgrp2"," --geno 0.1 --maf 0.05 --allow-no-sex", " --make-bed --out ../dgrp2_QC_all_lines"), path ="/data/input/bfiles/") }# Use the shell command 'sed' to remove underscores from the DGRP line names in the .fam file (e.g. 'line_120' becomes 'line120')# Otherwise, these underscores cause trouble when we need to convert from PLINK to vcf format (vcf format uses underscore as a separator)if(Operating_system =="windows"){ for(i in1:2) run_command_no_plink("sed -i '' 's/_//' dgrp2_QC_all_lines.fam", path ="/data/Derived/") }if(Operating_system =="mac"){ for(i in1:2) run_command_mac("sed -i '' 's/_//' dgrp2_QC_all_lines.fam", path ="/data/Derived/") }# Now impute the missing genotypes using Beagle# This part uses the data for the full DGRP panel of >200 lines, to infer missing genotypes as accurately as possible. # The bigsnpr package provides a helpful wrapper for Beagle called snp_beagleImpute(): it translates to a VCF file and back again using PLINK# Imputation with the below optimisation took about 25 mins on my computer, which is a high spec macbook by 2025 standardssnp_beagleImpute(beagle, plink, bedfile.in ="data/Derived/plink_output/dgrp2_QC_all_lines.bed", bedfile.out ="data/Derived/plink_output/dgrp2_QC_all_lines_imputed.bed",ncores =10, memory.max =32)# assign a sex of 'female' to all the DGRP lines (Beagle removes the sex, and it seems PLINK needs individuals to have a sex, # despite PLINK having a command called --allow-no-sex)if(Operating_system =="windows"){ run_command_windows("sed -i '' 's/ 0 0 0/ 0 0 2/' dgrp2_QC_all_lines_imputed.fam", path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac("sed -i '' 's/ 0 0 0/ 0 0 2/' dgrp2_QC_all_lines_imputed.fam", path ="/data/Derived/plink_output/") }# Re-write the .bed file, to make sure the MAF threshold and minor/major allele designations are correctly assigned post-Beagleif(Operating_system =="windows"){ run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed"," --geno 0.1 --maf 0.05 --allow-no-sex", " --make-bed --out dgrp2_QC_all_lines_imputed_correct"), path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed"," --geno 0.1 --maf 0.05 --allow-no-sex", " --make-bed --out dgrp2_QC_all_lines_imputed_correct"), path ="/data/Derived/plink_output/") }#unlink(list.files("data/derived", pattern = "~", full.names = TRUE)) # delete the original files, which were given a ~ file name by PLINK}
Get minor allele frequencies in the DGRP
Show the code
# Use PLINK to get the allele IDs and calculate the MAFs across the whole DGRP, for all SNPs that survived QC# The file created is called data/derived/plink.frqif(!file.exists("data/Derived/plink_output/plink.frq")){if(Operating_system =="windows"){ run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct"," --freq"), path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct"," --freq"), path ="/data/Derived/plink_output/") }}# Extract and save the MAFs, SNP positions, and major/minor allelesMAFs <-read.table("data/Derived/plink_output/plink.frq", header =TRUE, stringsAsFactors =FALSE) %>%mutate(position =map_chr(strsplit(SNP, split ="_"), function(x) x[2])) %>% dplyr::select(SNP, position, MAF, A1, A2) %>%rename(minor_allele = A1,major_allele = A2) %>%as_tibble()
Create a reduced list of LD-pruned SNPs with PLINK
1,646,615 variants passed the MAF and missingness quality control. However, proximity causes strong linkage disequilibrium, such that neighbouring SNPs tend to have similar associations with the trait under consideration in GWAS. Separate genomic regions can be identified by pruning the number of SNPs within a genomic region using the plink arguments --indep-pairwise 100 10 0.2, which prunes SNPs within 100kB sliding windows, sliding 10 variants along with each step, and allows a maximum pairwise correlation (\(r^2\)) threshold of 0.2 between loci within the window. With these parameters, 1,419,773 variants were removed, leaving 226,842.
Show the code
# indep-pairwise arguments are: # 100kB window size, # variant count to shift the window by 10 variants at the end of each step, # pairwise r^2 threshold of 0.2if(!file.exists("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.out")) {if(Operating_system =="windows"){ run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct"," --indep-pairwise 100 10 0.2"), path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct"," --indep-pairwise 100 10 0.2"), path ="/data/Derived/plink_output/") }}Genomic_regions <-read.table("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.in") %>%rename(SNP = V1)
Build GWAS function
We use the --assoc flag to run a basic linear regression, fit with two data points: the mean phenotype for individuals homozygous for the minor allele and the mean phenotype for individuals homozygous for the major allele. The effect size (\(\beta\)) is the slope of this regression line. Negative effect sizes indicate that the minor allele is associated with a higher trait value than the major allele. The test statistic T and p-value are produced by a Wald test.
Show the code
run_GWAS <-function(phenotypes){# Make a list of the lines in our sample and save as a text file for passing to PLINK lines_to_keep <- phenotypes %>% dplyr::select(line) %>%mutate(line_2 = line)write.table(lines_to_keep, row.names =FALSE, col.names =FALSE, file ="data/Derived/plink_output/lines_to_keep.txt", quote =FALSE)# Now cull the PLINK files to just the lines that we measured, and re-apply the # MAF cut-off of 0.05 for the new smaller sample of DGRP linesif(Operating_system =="windows"){ run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct"," --keep-allele-order", # force PLINK to retain the major/minor allele designations that apply to the DGRP as a whole" --keep lines_to_keep.txt --geno 0.1 --maf 0.05", " --make-bed --out dgrp2_QC_focal_lines"), path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct"," --keep-allele-order", # force PLINK to retain the major/minor allele designations that apply to the DGRP as a whole" --keep lines_to_keep.txt --geno 0.1 --maf 0.05", " --make-bed --out dgrp2_QC_focal_lines"), path ="/data/Derived/plink_output/") }# Define a function to add our phenotype data to a .fam file, which is needed for GWAS analysis and to make sure PLINK includes these samples# The 'phenotypes' data frame needs to have a column called 'line' add_phenotypes_to_fam <-function(filepath, phenotypes){read_delim(filepath, col_names =FALSE, delim =" ") %>% dplyr::select(X1, X2, X3, X4, X5) %>%# Get all the non-phenotype columnsleft_join(phenotypes, by =c("X1"="line")) %>%write.table(file ="data/Derived/plink_output/dgrp2_QC_focal_lines_NEW.fam", col.names =FALSE, row.names =FALSE, quote =FALSE, sep =" ")unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.fam")file.rename("data/Derived/plink_output/dgrp2_QC_focal_lines_NEW.fam", "data/Derived/plink_output/dgrp2_QC_focal_lines.fam") }# edit the new FAM file to add the phenotype data from 'phenotypes'add_phenotypes_to_fam("data/Derived/plink_output/dgrp2_QC_focal_lines.fam", phenotypes)# Run GWAS if(Operating_system =="windows"){ run_command_windows("--bfile dgrp2_QC_focal_lines --assoc --maf 0.05 --allow-no-sex", path ="/data/Derived/plink_output") }if(Operating_system =="mac"){ run_command_mac("{plink} --bfile dgrp2_QC_focal_lines --assoc --maf 0.05 --allow-no-sex", path ="/data/Derived/plink_output") }# wrangle the GWAS results Focal_name <-deparse(substitute(phenotypes)) gwas_results <-read.table("data/Derived/plink_output/plink.qassoc", header =TRUE) %>% dplyr::select(SNP, BETA, SE, "T", P)# Rename and compress the GWAS summary stats file # The filter step means that only variants in the LD-pruned subset get saved to disk. gwas_results %>%# filter(SNP %in% (pull(read_tsv("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.in", col_names = "SNP"), SNP))) %>% write_tsv(glue("data/Derived/GWAS_results/{Focal_name}.tsv.gz"))unlink("data/Derived/plink_output/plink.qassoc")# Rename the plink log filefile.rename("data/Derived/plink_output/plink.log",glue("data/Derived/plink_output/{Focal_name}_log.txt"))unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.bim")unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.bed")unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.fam")unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.log")}
Build manhattan plot function
Show the code
build_manhattan_plot <-function(gwas_results){ manhattan_data <- gwas_results %>%mutate(position =str_split(SNP, "_"), # split the SNP name into logical bitschr =map_chr(position, ~ .x[1]), # the first bit is the chromosome arm - name the column appropriatelyposition =as.numeric(map_chr(position, ~ .x[2])), # where on the chromosome is the SNP foundpval =-1*log10(P)) %>%# make visualising the p values easier dplyr::select(chr, position, pval) %>%filter(chr !="4")# this next chunk finds convenient cuts for labels and colour changes max_pos <- manhattan_data %>%group_by(chr) %>%summarise(max_pos =max(position), middle_pos = (max_pos -min(position)) /2,.groups ="drop") %>%as.data.frame() max_pos$max_pos <-c(0, cumsum(max_pos$max_pos[1:4])) max_pos$label_pos <- max_pos$max_pos + max_pos$middle_pos# combine the two dataframes created above manhattan_data <- manhattan_data %>%left_join(max_pos, by ="chr") %>%mutate(position = position + max_pos,chromosome_colour =case_when(chr =="2L"| chr =="3L"| chr =="X"~"A",.default ="B"),Notable =if_else(pval >=-log10(1e-08), "YES", "NO")) plot <- manhattan_data %>%filter(Notable =="NO") %>%ggplot(aes(position, pval, group = chr, stroke =0.01)) +geom_point(aes(colour = chromosome_colour), size =1.8, alpha =1) +geom_point(data =manhattan_data %>%filter(Notable =="YES"),aes(fill = chromosome_colour), colour ="black", shape =21, size =3.5, alpha =1) +geom_hline(yintercept =-log10(1e-08), linetype =2, colour ="#33271e", linewidth =1, alpha =0.8) +#geom_hline(yintercept = -log10(1e-05), linetype = 2, colour = "#33271e", linewidth = 1, alpha = 0.8) +scale_colour_manual(values =c(met.brewer(name ="Hokusai3")[3], met.brewer(name ="Hokusai3")[6])) +scale_fill_manual(values =c(met.brewer(name ="Hokusai3")[3], met.brewer(name ="Hokusai3")[6])) +scale_x_continuous(breaks = max_pos$label_pos, labels = max_pos$chr) +ylab("-log~10~(_p_)") +xlab("Chromosome and position") +theme_classic() +theme(legend.position ="none",axis.title.y =element_markdown(size =14),axis.title.x =element_markdown(size =14),axis.text.x =element_text(size =12),axis.text.y =element_text(size =12)) }
Preparing for cross phenotype meta-analysis
The power to detect variants associated with genetically correlated phenotypes can be increased if a meta-analytic approach is adopted (Zhu et al. 2018). Here, we used the cross-phenotype association (hereafter CPASSOC) approach developed by Zhou et al. (2015), which evaluates the null hypothesis that SNPs are not associated with any of the considered traits, weighted by the sample size of each study and adjusted for the trait genetic correlation matrix. In less language, CPASSOC evaluates the aggregated evidence of an association between a SNP and multiple phenotypes. The steps to apply CPASSOC are as follows:
Estimate \(R\), the trait correlation matrix. In the DGRP, this can be done using SNP data (pruned to minimise the effect of linkage disequilibrium) or using line means.
GWAS each trait separately (see above).
Collate effect sizes for each trait into a vector \(\mathbf{\beta}\) for each SNP.
Use a Wald test to estimate a test statistic \(T_{ijk}\) for the \(i^{th}\) SNP, \(j^{th}\) cohort and \(k^{th}\) treatment condition:
\[T_{ijk} = \frac{\hat\beta_{ijk}}{\hat{s}_{ijk}}\] , where \(\hat\beta_{ijk}\) and \(\hat{s}_{ijk}\) are the estimated coefficient and standard error for the \(i^{th}\) SNP in the \(j^{th}\), for the \(k^{th}\) treatment condition. From individual test statistics, a vector holding test statistics for all traits (\(T\)) can be built.
Test whether \(\mathbf{\beta} = \mathbf{0}\). If the trait data are homogeneous (SNPs are expected to affect all traits in the same direction and at the same magnitude), use:
\[S_{Hom} = \frac{e^T(RW)^{-1}T(e^T(RW)^{-1}T)^T}{e^T(WRW)^{-1}e}\] where \(W\) is a diagonal matrix of weights for the individual test statistics (either the inverse of the variance or in our case the square root of the sample size of the \(j^{th}\) cohort: \(\sqrt{n_j}\)).
If there is heterogeneity between trait measures (i.e. it is a reasonable expectation that genetic variants could affect some traits in one direction and others in the opposing direction), \(S_{Hom}\) is not appropriate. The ideal test statistic in this case is agnostic to the sign of a genetic variant’s phenoypic effect and includes only the cohorts and traits with a true contribution to the association of a genetic variant. To implement this, the absolute value \(\tau\) is used as a threshold, below which traits are not included in the test statistic. To allow for for effects of different signs in different environmental contexts, let \(w_{ijk} = \sqrt{n_j}\times \mathrm{sign}(T_{ijk})\). To calculate this heterogenous summary statistic first find,
This statistic, \(S_{Het}\) can be viewed as the maximum of the weighted sum of trait-specific test statistics that satisfy the \(\tau\) threshold.
When \(\tau\) is large, \(S(\tau)\) can be undefined if the test statistic falls below \(\tau\) in all contexts and cohorts. In this case \(S(\tau) = 0\). Our test statistic is then
\[\displaystyle S_{Het} = \max_{\tau \gt 0} S(\tau)\] Note that the restriction imposed by \(\tau\) and the sign specific \(w_{ijk}\) are the only differences between \(S_{Het}\) and \(S_{Hom}\).
The inclusion of \(\tau\) might give the impression of ‘cherry picking’. However, the value of \(S_{Het}\) lies in increasing power relative to univariate GWAS not by assessing if all tests return a strong, concordant association, but by identifying that more than one phenotype is strongly associated with a SNP. The more phenotypes associated, the larger \(S_{Het}\) becomes.
To generate p-values, \(S_{Het}\) is compared to a gamma distribution with a mean shift of test-statistics (see Zhou et al. (2015)).
Code to implement both statistics in R can be downloaded here, and is called below.
As a point of comparison, we find the sum of significant associations detected by univariate GWAS
Table SX. Genotype to phenotype associations detected by univariate GWAS, for life expectancy. The total row shows the number of unique candidate variants identified across all studies. *Wilson et al phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis. The number of genomic regions indicates the number of assocations found after LD pruning.
Table SX. Genotype to phenotype associations detected by univariate GWAS, for lifespan equality. The total row shows the number of unique candidate variants identified across all studies. *Wilson et al phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis. The number of genomic regions indicates the number of assocations found after LD pruning.
We calculate the genetic correlations between traits using both the line mean and SNP effect size comparisons. Following Zhu et al. (2015), we use the SNP correlations for analysis.
The purpose of this meta-analysis is to search for SNPs that have some effect on life expectancy or lifespan equality, expressed in many different contexts (sexes, temperatures and mating status’).
To conduct CPASSOC for a given SNP, we need a \(T\) statistic from each environmental context. A different number of lines were included in each GWAS, which caused small differences in the number of SNPs assessed in each cohort. We therefore prune the list of SNPs to those included in all univariate analyses. After pruning, 1,603,213 SNPs remain.
The Bonferroni adjusted significance threshold for this number of tests is \(p_{adj} = \frac{0.05}{1603213} = 3.12\times 10^{-8}\); here and for all future analysis, we use p\(< 10^{-8}\) as our significance threshold, as this is slightly more conservative and easier to quickly interpret.
Life expectancy
Show the code
Arya_f_l_T <- Arya_f_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_f = T)Huang_f_18_l_T <- Huang_f_18_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_18 = T)Huang_f_25_l_T <- Huang_f_25_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_25 = T)Huang_f_28_l_T <- Huang_f_28_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_28 = T)Wilson_f_l_1_T <- Wilson_f_l_1_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_25_1 = T)Wilson_f_l_2_T <- Wilson_f_l_2_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_25_2 = T)Durham_f_l_T <- Durham_f_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Durham_f_25 = T)Patel_f_l_T <- Patel_f_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Patel_f_23 = T)Arya_m_l_T <- Arya_m_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_m = T)Huang_m_18_l_T <- Huang_m_18_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_18 = T)Huang_m_25_l_T <- Huang_m_25_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_25 = T)Huang_m_28_l_T <- Huang_m_28_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_28 = T)all_e0_t_stats <- Arya_f_l_T %>%inner_join(Huang_f_18_l_T, by ="SNP") %>%inner_join(Huang_f_25_l_T, by ="SNP") %>%inner_join(Huang_f_28_l_T, by ="SNP") %>%inner_join(Wilson_f_l_1_T, by ="SNP") %>%inner_join(Wilson_f_l_2_T, by ="SNP") %>%inner_join(Durham_f_l_T, by ="SNP") %>%inner_join(Patel_f_l_T, by ="SNP") %>%inner_join(Arya_m_l_T, by ="SNP") %>%inner_join(Huang_m_18_l_T, by ="SNP") %>%inner_join(Huang_m_25_l_T, by ="SNP") %>%inner_join(Huang_m_28_l_T, by ="SNP")all_e0_t_stats_values <- all_e0_t_stats %>% dplyr::select(2:13)Sample_size_all <-c(165, 183, 186, 177, 161, 161, 176, 193, 165, 183, 186, 177) if(!file.exists("data/Derived/GWAS_results/all_e0_meta_results.csv")) {# run the homogeneous effect meta-analysisS_hom <-SHom(all_e0_t_stats_values, Sample_size_all, SNP_e0_corr_matrix)# calculate meta-p-values and bind the two together with the SNP namesp_S_hom <-pchisq(S_hom, df =1, ncp =0, lower.tail = F) %>%as_tibble() %>%bind_cols(S_hom) %>%rename(meta_p_hom = value, S_hom = ...2)# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary (potentially in sign) for different traits e.g. if a SNP has a sex or enviornment opposite effect on lifespan)# estimate parameters of gamma distributionpara <-EstimateGamma(N =1E4, Sample_size_all, SNP_e0_corr_matrix);S_het <-SHet(all_e0_t_stats_values, Sample_size_all, SNP_e0_corr_matrix)# obtain P-values of S_Het using the estimated gamma parametersp_S_het <-pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>%as_tibble() %>%bind_cols(S_het) %>%rename(meta_p_het = value, S_het = ...2)# bind meta statistics with the univariate effect sizesall_e0_meta_results <- all_e0_t_stats %>%bind_cols(p_S_hom, p_S_het) write_csv(all_e0_meta_results, "data/Derived/GWAS_results/all_e0_meta_results.csv")} else all_e0_meta_results <-read_csv("data/Derived/GWAS_results/all_e0_meta_results.csv")
Lifespan equality
Show the code
Arya_f_h_T <- Arya_f_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_f = T)Huang_f_18_h_T <- Huang_f_18_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_18 = T)Huang_f_25_h_T <- Huang_f_25_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_25 = T)Huang_f_28_h_T <- Huang_f_28_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_28 = T)Wilson_f_h_1_T <- Wilson_f_h_1_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_25_1 = T)Wilson_f_h_2_T <- Wilson_f_h_2_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_25_2 = T)Durham_f_h_T <- Durham_f_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Durham_f_25 = T)Patel_f_h_T <- Patel_f_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Patel_f_23 = T)Arya_m_h_T <- Arya_m_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_m = T)Huang_m_18_h_T <- Huang_m_18_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_18 = T)Huang_m_25_h_T <- Huang_m_25_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_25 = T)Huang_m_28_h_T <- Huang_m_28_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_28 = T)all_h_t_stats <- Arya_f_h_T %>%inner_join(Huang_f_18_h_T, by ="SNP") %>%inner_join(Huang_f_25_h_T, by ="SNP") %>%inner_join(Huang_f_28_h_T, by ="SNP") %>%inner_join(Wilson_f_h_1_T, by ="SNP") %>%inner_join(Wilson_f_h_2_T, by ="SNP") %>%inner_join(Durham_f_h_T, by ="SNP") %>%inner_join(Patel_f_h_T, by ="SNP") %>%inner_join(Arya_m_h_T, by ="SNP") %>%inner_join(Huang_m_18_h_T, by ="SNP") %>%inner_join(Huang_m_25_h_T, by ="SNP") %>%inner_join(Huang_m_28_h_T, by ="SNP") all_h_t_stats_values <- all_h_t_stats %>% dplyr::select(2:13)if(!file.exists("data/Derived/GWAS_results/all_h_meta_results.csv")) {S_hom <-SHom(all_h_t_stats_values, Sample_size_all, SNP_h_corr_matrix)# calculate meta-p-values and bind the two together with the SNP namesp_S_hom <-pchisq(S_hom, df =1, ncp =0, lower.tail = F) %>%as_tibble() %>%bind_cols(S_hom) %>%rename(meta_p_hom = value, S_hom = ...2)# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary (potentially in sign) for different traits e.g. if a SNP has a sex or enviornment opposite effect on lifespan)# estimate parameters of gamma distributionpara <-EstimateGamma(N =1E4, Sample_size_all, SNP_h_corr_matrix);S_het <-SHet(all_h_t_stats_values, Sample_size_all, SNP_h_corr_matrix)# obtain P-values of S_Het using the estimated gamma parametersp_S_het <-pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>%as_tibble() %>%bind_cols(S_het) %>%rename(meta_p_het = value, S_het = ...2)# bind meta statistics with the univariate effect sizesall_h_meta_results <- all_h_t_stats %>%bind_cols(p_S_hom, p_S_het)write_csv(all_h_meta_results, "data/Derived/GWAS_results/all_h_meta_results.csv")} else all_h_meta_results <-read_csv("data/Derived/GWAS_results/all_h_meta_results.csv")
Visualise the results
We combine GWAS summary statistics calculated from lifespan data measured across different contexts. It’s likely that some SNPs have G x E interactions that would lead to a heterogeneous effect across treatments. We therefore utilise the S_het calculated p-values.
First lets show the effect of CPASSOC on the number of variants found to be associated with life expectancy and lifespan equality.
Table SX. the number of variants associated with life expectancy and lifespan equality at various significance thresholds, estimated by univariate GWAS or CPASSOC.
Table SX. genes that encompass or are very close to the genetic variants that have strong associations with life expectancy.
Show the code
# join gene annotations with the list of analysed variants # note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.life_expectancy_variants <- all_e0_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP, S_het, meta_p_het) %>%left_join(annotations %>%filter(distance.to.gene <=500)) %>%mutate(meta_p_het =signif(meta_p_het*10^18, 3)/10^18,S_het =round(S_het, 3)) %>% dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)life_expectancy_variants %>%my_data_table()
Table SX. genes that encompass or are very close to the genetic variants that have strong associations with lifespan equality.
Show the code
# join gene annotations with the list of analysed variants # note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.lifespan_equality_variants <- all_h_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP, S_het, meta_p_het) %>%left_join(annotations %>%filter(distance.to.gene <=500)) %>%mutate(meta_p_het =signif(meta_p_het*10^15, 3)/10^15,S_het =round(S_het, 3)) %>% dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)lifespan_equality_variants %>%my_data_table()
Now build some ‘Manhattan plots’ to show where these significant associations can be found:
Figure XX. Manhattan plot showing the -Log10p-value for each genomic region’s effect on A) life expectancy and B) lifespan equality.
Plot the univariate effect sizes for each of the regions associated with life expectancy / lifespan equality at the genome-wide significance threshold (p < \(0.05^{-8}\)) after CPASSOC.
Life expectancy
Show the code
SNP_heatmap_e0 <- SNP_beta_e0 %>%inner_join( all_e0_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP) %>%inner_join(Genomic_regions))row_name <- SNP_heatmap_e0$SNPSNP_heatmap_e0 <- SNP_heatmap_e0 %>% dplyr::select(-SNP) %>%as.matrix()rownames(SNP_heatmap_e0) <- row_namebreaksList <-seq(-7, 7, by =0.01)annotation_SNPs <- all_e0_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP) %>%mutate(Chromosome =case_when(str_detect(SNP, "2L") ~"2L",str_detect(SNP, "2R") ~"2R",str_detect(SNP, "3L") ~"3L",str_detect(SNP, "3R") ~"3R",str_detect(SNP, "X") ~"X"))annotation_studies <-tibble(Study =c("Arya_f_25","Huang_f_18","Huang_f_25","Huang_f_28","Wilson_f_25_1","Wilson_f_25_2","Durham_f_25","Patel_f_23","Arya_m_25","Huang_m_18","Huang_m_25","Huang_m_28"),Temperature =c("25","18","25","28","25","25","25","23","25","18","25","28")) %>%mutate(Sex =case_when(str_detect(Study, "_f") ~"Female",.default ="Male"),Mating =case_when(str_detect(Study, "Arya") ~"NO",str_detect(Study, "Huang") ~"Throughout life",str_detect(Study, "Wilson") ~"Early life",str_detect(Study, "Durham") ~"Throughout life",str_detect(Study, "Patel") ~"Early life"),Author =str_extract(Study, ".*(?=\\_)"),Author =str_remove(Author, "_f"),Author =str_remove(Author, "_m"))# create a study annotation column, need this to be a data.frame rather than a tibble for some reason Study_details <- annotation_studies %>%as.data.frame() %>% dplyr::select(Study, Temperature, Mating)my_categories <-data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2],Mating = Study_details[, 3])my_colors <-list(Temperature =c("18"="#7bbcd5", # sailboat colours from pnw"23"="#d0e2af","25"="#f5db99","28"="#e89c81"),Mating =c("NO"="#f8e3d1", # Shuksan from pnw"Early life"="#d7b1c5","Throughout life"="#ac8eab"),Chromosome =c("2L"="#d8aedd", # lake colours from pnw"2R"="#cb74ad","3L"="#11c2b5","3R"="#72e1e1","X"="#fbcc74"))# create a SNP annotation columnSNP_details <- annotation_SNPs %>%as.data.frame()my_SNP_categories <-data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])my_col_names <-c("Arya et al females", "Huang et al females", "Huang et al females","Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females","Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males","Huang et al males")pheatmap(SNP_heatmap_e0, breaks = breaksList, main ="",color =colorRampPalette(rev(met.brewer("Benedictus", direction =1)))(length(breaksList)),legend =TRUE, cutree_rows =6, cutree_cols =5, angle_col =45, border_color ="white",annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories,fontsize =8, labels_col = my_col_names)
Figure SX. univariate effect sizes for each of the genomic regions associated with life expectancy at the genome-wide significance threshold (p < \(10^{-8}\)) after CPASSOC. Effect sizes are expressed in days added to life expectancy per major allele copy. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis.
Figure XX. univariate effect sizes for each of the genomic regions associated with lifespan equality at the genome-wide significance threshold (p < \(10^{-8}\)) after CPASSOC. Effect sizes are expressed in equality added per major allele copy. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis.
Analysing the rate of ageing and baseline mortality
Axes of ageing rate and baseline mortality
We’ve shown that orthogonal deviation from the regression of lifespan equality on life expectancy closely corresponds to the rate of ageing (\(\beta\)) parameter in a Gompertz-Makeham mortality model. To identify regions of the genome associated with the rate of ageing, we can calculate a rate of ageing index for each line in each treatment. To create this index, we rotate the coordinate system of the life expectancy and lifespan equality plane by \(\theta\) degrees, where \(\theta\) is the angle between the positive x-axis and the regression of lifespan equality on life expectancy.
We then rotated the coordinate system of the life expectancy and lifespan equality plane clockwise by \(\theta\):
\[x' = -(x\cos(\theta) + y\sin(\theta))\]
\[y' = -(x\sin(\theta) - y\cos(\theta))\]
where \(x'\) and \(y'\) are the vectors of genotype means for baseline mortality rate and ageing rate, and \(x\) and \(y\) are vectors of genotype means for life expectancy and lifespan equality. We perform this transformation on the unscaled data.
Finally, simulate curves from for the Gompertz-Makeham distribution to show the correlation between the \(\alpha\) and \(\beta\) parametrs and our baseline mortality and ageing rate proxies.
Show the code
# script to draw h~e0 for different gompertz b# a sequencea_seq <-seq(-30,2,0.02)# b sequenceb_seq <-seq(-5,-0.5,0.5)b_seq <-exp(b_seq)gomp_seq <-data.frame(b=NULL,e0=NULL,h=NULL)age_seq <-seq(0,10000,0.1)Run_sim <-FALSE# change to TRUE to run the simif(Run_sim){for(i in1:length(b_seq)){for (j in1:length(a_seq)){ lx <-exp(-exp(a_seq[j])/b_seq[i]*(exp(b_seq[i]*age_seq)-1)) lx <- lx[lx!=0]if(tail(lx,1)<0.1){ e0_gomp <-sum(lx)*0.1 disparity <--sum(lx*log(lx))*0.1 h_gomp <--log(disparity/e0_gomp) } gomp_seq <-rbind(gomp_seq,c(b_seq[i],e0_gomp,h_gomp)) } }write_csv(gomp_seq, "data/Derived/gompertz_simulation.csv")} else{ gomp_seq <-read_csv("data/Derived/gompertz_simulation.csv")}names(gomp_seq) <-c("b","e0","h")gomp_seq$b <-log(gomp_seq$b)gomp_seq$b <-as.factor(gomp_seq$b)
Plot the line means, coloured by their value on the ageing rate axis.
(a | b | c) / (d | e | f) / (g | h| i) / (j | k | l) +#guide_area()) +plot_layout(guides ='collect')
Figure SX. Points show DGRP lines, shaded by their genotypic values for the rate of ageing. Dashed curves show simulation outcomes from a Gompertz-Makeham distribution: the rate of ageing differs between curves but is fixed within them, where the baseline mortality decreases as curves progress to the right. Note that colour shows the rate of ageing relative to the mean within the treatment.
Table SX. Genotype to phenotype associations detected by univariate GWAS, for the rate of ageing. The number of genomic regions indicates the number of genetic variants associated with the rate of ageing after LD pruning. The total row shows the number of unique candidate variants identified across all studies. *Wilson et al. phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis.
Table SX. Genotype to phenotype associations detected by univariate GWAS, for baseline mortality rate. The number of genomic regions indicates the number of genetic variants associated with baseline mortality after LD pruning. The total row shows the number of unique candidate variants identified across all studies. *Wilson et al. phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis.
Using SNP effect sizes, we calculate the genetic correlations between a) rates of ageing and b) baseline mortality, measured in different environmental contexts.
Show the code
# use the BETA coefficients to build the SNP correlation matrix for the rate of ageingSNP_ageing_axis_data <-bind_rows( Arya_f_ageing_GWAS %>%mutate(Study ="Arya_2010", Temperature =25, Sex ="Female"), Arya_m_ageing_GWAS %>%mutate(Study ="Arya_2010", Temperature =25, Sex ="Male"), Huang_f_18_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =18, Sex ="Female"), Huang_m_18_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =18, Sex ="Male"), Huang_f_25_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =25, Sex ="Female"), Huang_m_25_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =25, Sex ="Male"), Huang_f_28_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =28, Sex ="Female"), Huang_m_28_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =28, Sex ="Male"), Wilson_f_ageing_1_GWAS %>%mutate(Study ="Wilson_2020_1", Temperature =25, Sex ="Female"), Wilson_f_ageing_2_GWAS %>%mutate(Study ="Wilson_2020_2", Temperature =25, Sex ="Female"), Durham_f_ageing_GWAS %>%mutate(Study ="Durham_2014", Temperature =25, Sex ="Female"), Patel_f_ageing_GWAS %>%mutate(Study ="Patel_2021", Temperature =23, Sex ="Female")) %>% dplyr::select(SNP, BETA, Study, Temperature, Sex) %>%pivot_wider(values_from = BETA, names_from =c(Study, Temperature, Sex)) SNP_ageing_axis_LD_pruned <- SNP_ageing_axis_data %>%inner_join(Genomic_regions)SNP_ageing_axis_corr_matrix <-cor(SNP_ageing_axis_LD_pruned %>% dplyr::select(-SNP), use ="pairwise.complete.obs", method ="spearman")# use the BETA coefficients to build the SNP correlation matrix for scalingSNP_baseline_mortality_axis_data <-bind_rows( Arya_f_baseline_mortality_GWAS %>%mutate(Study ="Arya_2010", Temperature =25, Sex ="Female"), Arya_m_baseline_mortality_GWAS %>%mutate(Study ="Arya_2010", Temperature =25, Sex ="Male"), Huang_f_18_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =18, Sex ="Female"), Huang_m_18_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =18, Sex ="Male"), Huang_f_25_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =25, Sex ="Female"), Huang_m_25_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =25, Sex ="Male"), Huang_f_28_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =28, Sex ="Female"), Huang_m_28_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =28, Sex ="Male"), Wilson_f_baseline_mortality_1_GWAS %>%mutate(Study ="Wilson_2020_1", Temperature =25, Sex ="Female"), Wilson_f_baseline_mortality_2_GWAS %>%mutate(Study ="Wilson_2020_2", Temperature =25, Sex ="Female"), Durham_f_baseline_mortality_GWAS %>%mutate(Study ="Durham", Temperature =25, Sex ="Female"), Patel_f_baseline_mortality_GWAS %>%mutate(Study ="Patel", Temperature =23, Sex ="Female")) %>% dplyr::select(SNP, BETA, Study, Temperature, Sex) %>%pivot_wider(values_from = BETA, names_from =c(Study, Temperature, Sex))SNP_baseline_mortality_axis_LD_pruned <- SNP_baseline_mortality_axis_data %>%inner_join(Genomic_regions)SNP_baseline_mortality_axis_corr_matrix <-cor(SNP_baseline_mortality_axis_LD_pruned %>% dplyr::select(-SNP), use ="pairwise.complete.obs", method ="spearman")
Calculate meta-analytic test statistics
The purpose of these meta-analyses is to detect SNPs associated with 1) the rate of ageing and 2) baseline mortality rate.
Run CPASSOC for the rate of ageing
Show the code
# rate of ageingageing_axis_Arya_f_T <- Arya_f_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_f = T)ageing_axis_Arya_m_T <- Arya_m_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_m = T)ageing_axis_Huang_f_18_T <- Huang_f_18_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_18 = T)ageing_axis_Huang_m_18_T <- Huang_m_18_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_18 = T)ageing_axis_Huang_f_25_T <- Huang_f_25_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_25 = T)ageing_axis_Huang_m_25_T <- Huang_m_25_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_25 = T)ageing_axis_Huang_f_28_T <- Huang_f_28_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_28 = T)ageing_axis_Huang_m_28_T <- Huang_m_28_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_28 = T)ageing_axis_Wilson_f_1_T <- Wilson_f_ageing_1_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_1 = T)ageing_axis_Wilson_f_2_T <- Wilson_f_ageing_2_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_2 = T)ageing_axis_Durham_f_T <- Durham_f_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Durham_f = T)ageing_axis_Patel_f_T <- Patel_f_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Patel_f = T)ageing_axis_t_stats <- ageing_axis_Arya_f_T %>%inner_join(ageing_axis_Arya_m_T, by ="SNP") %>%inner_join(ageing_axis_Huang_f_18_T, by ="SNP") %>%inner_join(ageing_axis_Huang_m_18_T, by ="SNP") %>%inner_join(ageing_axis_Huang_f_25_T, by ="SNP") %>%inner_join(ageing_axis_Huang_m_25_T, by ="SNP") %>%inner_join(ageing_axis_Huang_f_28_T, by ="SNP") %>%inner_join(ageing_axis_Huang_m_28_T, by ="SNP") %>%inner_join(ageing_axis_Wilson_f_1_T, by ="SNP") %>%inner_join(ageing_axis_Wilson_f_2_T, by ="SNP") %>%inner_join(ageing_axis_Durham_f_T, by ="SNP") %>%inner_join(ageing_axis_Patel_f_T, by ="SNP") ageing_axis_t_stat_values <- ageing_axis_t_stats %>% dplyr::select(2:13)Sample_size_ageing_axis <-c(165, 165, 183, 183, 186, 186, 177, 177, 161, 161, 176, 193)if(!file.exists("data/Derived/GWAS_results/ageing_axis_meta_results.csv")) {# run the homogeneous effect meta-analysisS_hom <-SHom(ageing_axis_t_stat_values, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix)# calculate meta-p-values and bind the two together with the SNP namesp_S_hom <-pchisq(S_hom, df =1, ncp =0, lower.tail = F) %>%as_tibble() %>%bind_cols(S_hom) %>%rename(meta_p_hom = value, S_hom = ...2)# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary for different traits e.g. if a SNP has a sex or environment opposite effect on lifespan# estimate parameters of gamma distributionpara <-EstimateGamma(N =1E4, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix);S_het <-SHet(ageing_axis_t_stat_values, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix)# obtain P-values of S_Het using the estimated gamma parametersp_S_het <-pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>%as_tibble() %>%bind_cols(S_het) %>%rename(meta_p_het = value, S_het = ...2)ageing_axis_meta_results <- ageing_axis_t_stats %>%bind_cols(p_S_hom, p_S_het) # add the unadjusted p valueswrite_csv(ageing_axis_meta_results, "data/Derived/GWAS_results/ageing_axis_meta_results.csv")} else ageing_axis_meta_results <-read_csv("data/Derived/GWAS_results/ageing_axis_meta_results.csv")
Run CPASSOC for the baseline rate of mortality
Show the code
baseline_mortality_axis_Arya_f_T <- Arya_f_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_f = T)baseline_mortality_axis_Arya_m_T <- Arya_m_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_m = T)baseline_mortality_axis_Huang_f_18_T <- Huang_f_18_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_18 = T)baseline_mortality_axis_Huang_m_18_T <- Huang_m_18_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_18 = T)baseline_mortality_axis_Huang_f_25_T <- Huang_f_25_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_25 = T)baseline_mortality_axis_Huang_m_25_T <- Huang_m_25_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_25 = T)baseline_mortality_axis_Huang_f_28_T <- Huang_f_28_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_28 = T)baseline_mortality_axis_Huang_m_28_T <- Huang_m_28_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_28 = T)baseline_mortality_axis_Wilson_f_1_T <- Wilson_f_baseline_mortality_1_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_1 = T)baseline_mortality_axis_Wilson_f_2_T <- Wilson_f_baseline_mortality_2_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_2 = T)baseline_mortality_axis_Durham_f_T <- Durham_f_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Durham_f = T)baseline_mortality_axis_Patel_f_T <- Patel_f_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Patel_f = T)baseline_mortality_axis_t_stats <- baseline_mortality_axis_Arya_f_T %>%inner_join(baseline_mortality_axis_Arya_m_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_f_18_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_m_18_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_f_25_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_m_25_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_f_28_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_m_28_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Wilson_f_1_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Wilson_f_2_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Durham_f_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Patel_f_T, by ="SNP") baseline_mortality_axis_t_stat_values <- baseline_mortality_axis_t_stats %>% dplyr::select(2:13)Sample_size_baseline_mortality_axis <-c(165, 165, 183, 183, 186, 186, 177, 177, 161, 161, 176, 193)if(!file.exists("data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")) {# run the homogeneous effect meta-analysisS_hom <-SHom(baseline_mortality_axis_t_stat_values, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix)# calculate meta-p-values and bind the two together with the SNP namesp_S_hom <-pchisq(S_hom, df =1, ncp =0, lower.tail = F) %>%as_tibble() %>%bind_cols(S_hom) %>%rename(meta_p_hom = value, S_hom = ...2)# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary for different traits (e.g. if a SNP has a sex or enviornment opposite effect on lifespan)# estimate parameters of gamma distributionpara <-EstimateGamma(N =1E4, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix);S_het <-SHet(baseline_mortality_axis_t_stat_values, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix)# obtain P-values of S_Het using the estimated gamma parametersp_S_het <-pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>%as_tibble() %>%bind_cols(S_het) %>%rename(meta_p_het = value, S_het = ...2)baseline_mortality_axis_meta_results <- baseline_mortality_axis_t_stats %>%bind_cols(p_S_hom, p_S_het) # add the unadjusted p valueswrite_csv(baseline_mortality_axis_meta_results, "data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")} else baseline_mortality_axis_meta_results <-read_csv("data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")
Visualise the results
We combine GWAS \(T\) statistics calculated for the rate of ageing and baseline mortality measured across different contexts. It’s possible that some SNPs have G x E interactions that lead to a heterogeneous effect across phenotypes. We therefore utilise the S_het calculated p-values.
First lets show the effect of CPASSOC on the number of variants found to be associated with the rate of ageing and the scaling of mortality risk.
Table SX. the number of variants associated with ageing rate and baseline mortality at various significance thresholds, estimated by univariate GWAS or CPASSOC. The number of genomic regions indicates the number of variants detected after LD pruning.
Table SX. genes that encompass or are very close to the genetic variants that have associations with the rate of ageing.
Show the code
# join gene annotations with the list of analysed variants # note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.ageing_rate_genes <- ageing_axis_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP, S_het, meta_p_het) %>%left_join(annotations %>%filter(distance.to.gene <=500)) %>%mutate(meta_p_het =signif(meta_p_het*10^9, 3)/10^9,S_het =round(S_het, 3)) %>% dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)ageing_rate_genes %>%my_data_table()
Table SX. genes that encompass or are very close to the genetic variants that have associations with baseline mortality rate.
Figure XX. Manhattan plots showing the -Log10p-value for each locus’ effect on baseline mortality and the rate of ageing.
Plot the univariate effect sizes for each of the genomic regions associated with the rate of ageing at the genome-wide significance threshold (p < \(10^{-8}\)) after CPASSOC.
Show the code
SNP_heatmap_ageing_axis <- SNP_ageing_axis_data %>%inner_join( ageing_axis_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP) %>%inner_join(Genomic_regions))row_name <- SNP_heatmap_ageing_axis$SNPSNP_heatmap_ageing_axis <- SNP_heatmap_ageing_axis %>% dplyr::select(-SNP) %>%as.matrix()rownames(SNP_heatmap_ageing_axis) <- row_namebreaksList <-seq(-0.1, 0.1, by =0.001)annotation_SNPs <- ageing_axis_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP) %>%mutate(Chromosome =case_when(str_detect(SNP, "2L") ~"2L",str_detect(SNP, "2R") ~"2R",str_detect(SNP, "3L") ~"3L",str_detect(SNP, "3R") ~"3R",str_detect(SNP, "X") ~"X"))annotation_studies <-tibble(Study =c("Arya_2010_f_25","Huang_2020_f_18","Huang_2020_f_25","Huang_2020_f_28","Wilson_2020_f_25_1","Wilson_2020_f_25_2","Durham_2014_f_25","Patel_2021_f_23","Arya_2010_m_25","Huang_2020_m_18","Huang_2020_m_25","Huang_2020_m_28"),Temperature =c("25","18","25","28","25","25","25","23","25","18","25","28")) %>%mutate(Sex =case_when(str_detect(Study, "_f") ~"Female",.default ="Male"),Mating =case_when(str_detect(Study, "Arya") ~"NO",str_detect(Study, "Huang") ~"Throughout life",str_detect(Study, "Wilson") ~"Early life",str_detect(Study, "Durham") ~"Throughout life",str_detect(Study, "Patel") ~"Early life"),Author =str_extract(Study, ".*(?=\\_)"),Author =str_remove(Author, "_f"),Author =str_remove(Author, "_m"))# create a study annotation column, need this to be a data.frame rather than a tibble for some reason Study_details <- annotation_studies %>%as.data.frame() %>% dplyr::select(Study, Temperature, Mating)my_categories <-data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2],Mating = Study_details[, 3])my_colors <-list(Temperature =c("18"="#7bbcd5", # sailboat colours from pnw"23"="#d0e2af","25"="#f5db99","28"="#e89c81"),Mating =c("NO"="#f8e3d1", # Shuksan from pnw"Early life"="#d7b1c5","Throughout life"="#ac8eab"),Chromosome =c("2L"="#d8aedd", # lake colours from pnw"2R"="#cb74ad","3L"="#11c2b5","3R"="#72e1e1","X"="#fbcc74"))# create a SNP annotation columnSNP_details <- annotation_SNPs %>%as.data.frame()my_SNP_categories <-data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])my_col_names <-c("Arya et al females", "Huang et al females", "Huang et al females","Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females","Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males","Huang et al males")pheatmap(SNP_heatmap_ageing_axis, breaks = breaksList, main ="",color =colorRampPalette(rev(met.brewer("Benedictus", direction =1)))(length(breaksList)),legend =TRUE, cutree_rows =6, cutree_cols =5, angle_col =45, border_color ="white",annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories,fontsize =8, labels_col = my_col_names)
Figure XX. univariate effect sizes for each of the genomic regions associated with ageing rate at the genome-wide significance threshold (p < \(10^{-8}\)) after CPASSOC. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis. Positive effect sizes indicate that the minor allele increases ageing rate in the conditions the study was performed in.
Plot the univariate effect sizes for each of the genomic regions associated with the scaling of mortality risk at the genome-wide significance threshold (p < \(0.05^{-8}\)) after CPASSOC.
Show the code
SNP_heatmap_baseline_mortality_axis <- SNP_baseline_mortality_axis_data %>%inner_join( baseline_mortality_axis_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP) %>%inner_join(Genomic_regions))row_name <- SNP_heatmap_baseline_mortality_axis$SNPSNP_heatmap_baseline_mortality_axis <- SNP_heatmap_baseline_mortality_axis %>% dplyr::select(-SNP) %>%as.matrix()rownames(SNP_heatmap_baseline_mortality_axis) <- row_namebreaksList <-seq(-7, 7, by =0.01)annotation_SNPs <- baseline_mortality_axis_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP) %>%mutate(Chromosome =case_when(str_detect(SNP, "2L") ~"2L",str_detect(SNP, "2R") ~"2R",str_detect(SNP, "3L") ~"3L",str_detect(SNP, "3R") ~"3R",str_detect(SNP, "X") ~"X"))annotation_studies <-tibble(Study =c("Arya_2010_f_25","Huang_2020_f_18","Huang_2020_f_25","Huang_2020_f_28","Wilson_2020_f_25_1","Wilson_2020_f_25_2","Durham_2014_f_25","Patel_2021_f_23","Arya_2010_m_25","Huang_2020_m_18","Huang_2020_m_25","Huang_2020_m_28"),Temperature =c("25","18","25","28","25","25","25","23","25","18","25","28")) %>%mutate(Sex =case_when(str_detect(Study, "_f") ~"Female",.default ="Male"),Mating =case_when(str_detect(Study, "Arya") ~"NO",str_detect(Study, "Huang") ~"Throughout life",str_detect(Study, "Wilson") ~"Early life",str_detect(Study, "Durham") ~"Throughout life",str_detect(Study, "Patel") ~"Early life"),Author =str_extract(Study, ".*(?=\\_)"),Author =str_remove(Author, "_f"),Author =str_remove(Author, "_m"))# create a study annotation column, need this to be a data.frame rather than a tibble for some reason Study_details <- annotation_studies %>%as.data.frame() %>% dplyr::select(Study, Temperature, Mating)my_categories <-data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2],Mating = Study_details[, 3])my_colors <-list(Temperature =c("18"="#7bbcd5", # sailboat colours from pnw"23"="#d0e2af","25"="#f5db99","28"="#e89c81"),Mating =c("NO"="#f8e3d1", # Shuksan from pnw"Early life"="#d7b1c5","Throughout life"="#ac8eab"),Chromosome =c("2L"="#d8aedd", # lake colours from pnw"2R"="#cb74ad","3L"="#11c2b5","3R"="#72e1e1","X"="#fbcc74"))# create a SNP annotation columnSNP_details <- annotation_SNPs %>%as.data.frame()my_SNP_categories <-data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])my_col_names <-c("Arya et al females", "Huang et al females", "Huang et al females","Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females","Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males","Huang et al males")pheatmap(SNP_heatmap_baseline_mortality_axis, breaks = breaksList, main ="",color =colorRampPalette(rev(met.brewer("Benedictus", direction =1)))(length(breaksList)),legend =TRUE, cutree_rows =6, cutree_cols =5, angle_col =45, border_color ="white",annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories,fontsize =8, labels_col = my_col_names)
Figure XX. univariate effect sizes for each of the SNPs associated with mortality scaling at the genome-wide significance threshold (p < \(0.05^{-8}\)) after CPASSOC. Effect sizes are expressed in standard deviations from the mean life expectancy found in each study. Studies are clustered by similiarity in SNP effects on the X axis, while SNPs are clustered by similarity in effect size across studies on the Y axis. Positive effect sizes indicate that the minor allele increases life expectancy in the conditions the study was performed in.
Are ageing and baseline mortality polygenic?
If traits are polygenic, the majority of the genetic variants that effect their expession will have effects that are too small to detect with GWA, unless sample sizes are truly gigantic. A promising alternative is to instead look to see if effects estimated in one study can be replicated in a second, independent study. To test this in our dataset, we selected one trait measurement from each study trait that phenotyped females, at 25C, with an opportunity for mating.
As a control, this is what happens if we bin and plot the relationship between two uncorrelated variables
ageing_boyle_data <- SNP_ageing_axis_LD_pruned %>% dplyr::select(SNP, Huang_2020_25_Female, Wilson_2020_1_25_Female, Durham_2014_25_Female) %>%filter_at(vars(2:4), all_vars(!is.na(.))) %>%# remove NAsarrange(Huang_2020_25_Female) %>%mutate(bin =c(rep(1:floor(n()/100), each =100),rep(floor(n()/100) +1, each =n() %%100))) %>%group_by(bin) %>%summarise(Huang_2020_25_Female =mean(Huang_2020_25_Female), Wilson_2020_1_25_Female =mean(Wilson_2020_1_25_Female),Durham_2014_25_Female =mean(Durham_2014_25_Female))ageing_boyle_data_2 <- SNP_ageing_axis_LD_pruned %>% dplyr::select(SNP, Wilson_2020_1_25_Female, Durham_2014_25_Female) %>%filter_at(vars(2:3), all_vars(!is.na(.))) %>%# remove NAsarrange(Wilson_2020_1_25_Female) %>%mutate(bin =c(rep(1:floor(n()/100), each =100),rep(floor(n()/100) +1, each =n() %%100))) %>%group_by(bin) %>%summarise(Wilson_2020_1_25_Female =mean(Wilson_2020_1_25_Female),Durham_2014_25_Female =mean(Durham_2014_25_Female))baseline_mortality_boyle_data <- SNP_baseline_mortality_axis_LD_pruned %>% dplyr::select(SNP, Huang_2020_25_Female, Wilson_2020_1_25_Female, Durham_25_Female) %>%filter_at(vars(2:4), all_vars(!is.na(.))) %>%# remove NAsarrange(Huang_2020_25_Female) %>%mutate(bin =c(rep(1:floor(n()/100), each =100),rep(floor(n()/100) +1, each =n() %%100))) %>%group_by(bin) %>%summarise(Huang_2020_25_Female =mean(Huang_2020_25_Female), Wilson_2020_1_25_Female =mean(Wilson_2020_1_25_Female),Durham_25_Female =mean(Durham_25_Female))baseline_mortality_boyle_data_2 <- SNP_baseline_mortality_axis_LD_pruned %>% dplyr::select(SNP, Wilson_2020_1_25_Female, Durham_25_Female) %>%filter_at(vars(2:3), all_vars(!is.na(.))) %>%# remove NAsarrange(Wilson_2020_1_25_Female) %>%mutate(bin =c(rep(1:floor(n()/100), each =100),rep(floor(n()/100) +1, each =n() %%100))) %>%group_by(bin) %>%summarise(Wilson_2020_1_25_Female =mean(Wilson_2020_1_25_Female),Durham_25_Female =mean(Durham_25_Female))boyle_plot_H_W <- ageing_boyle_data %>%ggplot(aes(Huang_2020_25_Female, Wilson_2020_1_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-0.16, 0.16), ylim =c(-0.1, 0.1)) +xlab("Ageing SNP effect (Huang et al.)") +ylab("Ageing SNP effect (Wilson et al.)") +theme_bw() +theme(strip.background =element_blank(),strip.text =element_text(hjust=0)) +theme(text =element_text(size =10))boyle_plot_H_D <- ageing_boyle_data %>%ggplot(aes(Huang_2020_25_Female, Durham_2014_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-0.16, 0.16), ylim =c(-0.1, 0.1)) +labs(x ="Ageing SNP effect (Huang et al.)",y ="Ageing SNP effect (Durham et al.)") +theme_bw() +theme(plot.title =element_text(hjust =0.5),text =element_text(size =10))boyle_plot_W_D <- ageing_boyle_data_2 %>%ggplot(aes(Wilson_2020_1_25_Female, Durham_2014_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-0.16, 0.16), ylim =c(-0.1, 0.1)) +xlab("Ageing SNP effect (Wilson et al.)") +ylab("Ageing SNP effect (Durham et al.)") +theme_bw() +theme(strip.background =element_blank(),strip.text =element_text(hjust=0)) +theme(text =element_text(size =10))boyle_baseline_plot_H_W <- baseline_mortality_boyle_data %>%ggplot(aes(Huang_2020_25_Female, Wilson_2020_1_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-5, 5), ylim =c(-3.125, 3.125)) +labs(x ="Scaling SNP effect (Huang et al.)",y ="Scaling SNP effect (Wilson et al.)") +theme_bw() +theme(strip.background =element_blank(),strip.text =element_text(hjust=0)) +theme(text =element_text(size =10))boyle_baseline_plot_H_D <- baseline_mortality_boyle_data %>%ggplot(aes(Huang_2020_25_Female, Durham_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-5, 5), ylim =c(-3.125, 3.125)) +labs(x ="Scaling SNP effect (Huang et al.)",y ="Scaling SNP effect (Durham et al.)") +theme_bw() +theme(plot.title =element_text(hjust =0.5),text =element_text(size =10))boyle_baseline_plot_W_D <- baseline_mortality_boyle_data_2 %>%ggplot(aes(Wilson_2020_1_25_Female, Durham_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-5, 5), ylim =c(-3.125, 3.125)) +labs(x ="Scaling SNP effect (Wilson et al.)",y ="Scaling SNP effect (Durham et al.)") +theme_bw() +theme(strip.background =element_blank(),strip.text =element_text(hjust=0)) +theme(text =element_text(size =10))(boyle_plot_H_W + boyle_plot_H_D + boyle_plot_W_D) / (boyle_baseline_plot_H_W + boyle_baseline_plot_H_D + boyle_baseline_plot_W_D)
Figure SX. Each point represents the mean effect size for a group of 100 genomic regions, ordered by association with female ageing rate (top panels) or female baseline mortality risk (bottom panels), measured in the study named on the x-axis. While traits were measured different laboratories, conditions were similar in each treatment: females were housed at 25C, with an opportunity for mating. Effect sizes are expressed as trait standard deviations.
Figure 4
Show the code
f4_a <- c +labs(title =NULL) +theme(legend.position="none")f4_b <- e +labs(title =NULL) +theme(legend.position ="none")f4_c <- g +labs(title =NULL) +theme(legend.position ="none")part_1 <- (f4_a + f4_b + f4_c) +plot_layout(#guides = collect, axis_titles ="collect")f4_e <- boyle_plot_H_W +labs(x ="SNP effect (Huang et al.)",y ="SNP effect (Wilson et al.)") f4_f <- boyle_plot_H_D +labs(x ="SNP effect (Huang et al.)",y ="SNP effect (Durham et al.)") f4_g <- boyle_plot_W_D +labs(x ="SNP effect (Wilson et al.)",y ="SNP effect (Durham et al.)") part_3 <- (f4_e + f4_f + f4_g)part_1 / (ageing_axis_het_plot +labs(title =NULL)) / part_3 +plot_annotation(tag_levels ="A")
Figure 4. detection of genetic variants associated with the rate of ageing. A-C demonstrate our ageing rate metric used for genome-wide association analysis. Dashed lines show simulations from the gompertz distribution: each line was generated with a different rate of ageing value and extends as the baseline mortality rate changes. Note that the slope from the regressions of lifespan equality on life expectancy align closely with these curves. Points show fly genotypes; deviations from the regression line therefore indicate that genotypes differ in the rate of ageing.
Source Code
---title: "Genome wide analyses"format: htmleditor: source #markdown: # wrap: 72execute: warning: false message: false---# Load packages and dataThe `MASS` package is required to run the `CPASSOC`. Unfortunately this clashes with the `dplyr``select()`. So be prepared to use `dplyr::select()` to get some things to work if you're adapting the code for your own use.```{r}#| results: hidelibrary(tidyverse) # tidy coding, ggplot etclibrary(data.table) # for the rleid functionlibrary(glue) # for coding within stringslibrary(bigsnpr) # to install: devtools::install_github("privefl/bigsnpr")#library(pander) # for slick simple tableslibrary(kableExtra) # for medium sized tableslibrary(DT) # for large, searchable tableslibrary(brms) # for bayesian modelslibrary(tidybayes) # for bayesian plotslibrary(ggtext) # for markdown syntax in ggplotlibrary(ggnewscale) # to reset colour scaleslibrary(MetBrewer) # for more colour paletteslibrary(MoMAColors) # nicer colours once againlibrary(PNWColors) # even more colours#library(hexbin) # for density heat maps#library(rcartocolor) # even more nice colourslibrary(patchwork) # for combining plots#library(ggrepel) # for labelling ggplotslibrary(pheatmap) # for heat mapslibrary(MASS) # needed for CPASSOClibrary(Matrix) # needed for CPASSOC#library(flexsurv) # for survival analysis#library(rptR) # for finding the intraclass correlation coefficient# build a helper function that produces a table to display our data# Create a function to build HTML searchable tablesmy_data_table <-function(df){datatable( df, rownames=FALSE,autoHideNavigation =TRUE,extensions =c("Scroller", "Buttons"),options =list(autoWidth =TRUE,dom ='Bfrtip',deferRender=TRUE,scrollX=TRUE, scrollY=1000,scrollCollapse=TRUE,buttons =list('pageLength', 'colvis', 'csv', list(extend ='pdf',pageSize ='A4',orientation ='landscape',filename ='Lifespan_data')),pageLength =100 ) )}```## Load variant/gene annotationsDGRP variant annotations were downloaded from the [DGRP website](http://dgrp2.gnets.ncsu.edu/data/website/dgrp.fb557.annot.txt) and gene annotations for all the genes covered by DGRP variants, from the `org.Dm.eg.db` database object from `Bioconductor`.These will be useful later when we aim to identify whether variants with notable associations with a trait overlap with any genes.```{r}#| results: hide# Helper function to split a vector into chunks chunker <-function(x, max_chunk_size) split(x, ceiling(seq_along(x) / max_chunk_size))if(!file.exists("data/derived/annotations.csv")){# Load annotation file, get important info annot <-read.table("data/input/dgrp.fb557.annot.txt", header =FALSE, stringsAsFactors =FALSE) get.info <-function(rows){lapply(rows, function(row){ site.class.field <-strsplit(annot$V3[row], split ="]")[[1]][1] num.genes <-str_count(site.class.field, ";") +1 output <-cbind(rep(annot$V1[row], num.genes), do.call("rbind", lapply(strsplit(site.class.field, split =";")[[1]], function(x) strsplit(x, split ="[|]")[[1]])))if(ncol(output) ==5) return(output[,c(1,2,4,5)]) # only return SNPs that have some annotation. Don't get the gene symbolelsereturn(NULL) }) %>%do.call("rbind", .) } variant.details <-lapply(chunker(1:nrow(annot), max_chunk_size =10000), get.info) %>%do.call("rbind", .) %>%as.data.frame()names(variant.details) <-c("SNP", "FBID", "site.class", "distance.to.gene") variant.details$FBID <-unlist(str_extract_all(variant.details$FBID, "FBgn[:digit:]+")) # clean up text strings for Flybase ID variant.details %>% dplyr::filter(site.class !="FBgn0003638") %>%# NB this is a bug in the DGRP's annotation filemutate(chr =str_remove_all(substr(SNP, 1, 2), "_")) # get chromosome now for faster sorting later annotations <- variant.details} else annotations <-read_csv("data/derived/annotations.csv")annotations <- annotations %>%left_join(read.csv("data/Input/all_dmel_genes.csv")) %>% dplyr::select(SNP, FBID, site.class, distance.to.gene, gene_name, chromosome)```## The raw dataset```{r}raw_data <-read_delim("data/Input/Raw_data/all_raw_data.csv",delim=',') %>%mutate(line =as.factor(line),Treatment =as.character(Treatment)) %>%unite(Treatment, c("Study", "Treatment", "Sex"), sep ="_") %>%filter(Genotyped =="YES") %>%#unite("Treatment", c(Treatment, Sex), sep = "_") %>% dplyr::select(line, Lifespan, Treatment, Vial_ID)```## Line mean dataIn the `demographic component` of this study, we calculated mean values and standard error for each combination of line, sex, study, temperature and mating status. These data are displayed, and can be downloaded from the below table. Note that for quantitative genetic, GWA and other SNP based analysis, we removed lines that had not been genotyped (shown as `Genotyped = NO`). Lines with unknown genotypes also have unknown Wolbachia and inversions status'. Durham et al (2014) cleared all lines of Wolbachia via treatment with tetracycline.```{r}#| results: hidegenotyped_lines <-read_csv("data/input/Genotyped_lines.csv") %>%mutate(Genotyped ="YES",line =as.factor(line))full_dataset <-read.csv("data/input/lifespan_data.csv") %>%as_tibble() %>%mutate(line =as.factor(Line),Treatment =str_replace(Treatment, " ", "_"),Treatment =case_when(Temperature ==18& Study =="Huang_2020"~"Huang_2020_1", Temperature ==25& Study =="Huang_2020"~"Huang_2020_2", Temperature ==28& Study =="Huang_2020"~"Huang_2020_3",.default = Treatment)) %>% dplyr::select(-Line) %>%left_join(genotyped_lines, by ="line") %>%mutate(Genotyped =if_else(is.na(Genotyped), "NO", Genotyped)) %>% dplyr::select(line, Sex, Temperature, Mated, Study, Treatment, Block, e0, SE_e0, h, SE_h, samp, Genotyped)# DGRP studies often correct for the most common inversions and wolbachia presence. inversions_wolbachia <-read_csv("data/Input/inversions_wolbachia.csv") %>%mutate(line =as.factor(str_remove(line, "DGRP_")),Wolbachia =if_else(Wolbachia =="y", 1, 0),across(2:17, ~case_when(.x =="ST"~0, .x =="INV/ST"~1, .x =="INV"~2))) %>% dplyr::select(line, `In(2L)t`, `In(2R)NS`, `In(3R)P`, `In(3R)K`, `In(3R)Mo`, Wolbachia) %>%rename(In_2L_t =`In(2L)t`,In_2R_NS =`In(2R)NS`,In_3R_P =`In(3R)P`,In_3R_K =`In(3R)K`,In_3R_Mo =`In(3R)Mo`)# inversions pruned to those Huang et al 2015 PNAS corrected forfull_dataset <- full_dataset %>%left_join(inversions_wolbachia) %>%mutate(Wolbachia =if_else(Study =="Durham_2014", 0, Wolbachia)) # study cleared wolbachia with tetracycline before phenotyping my_data_table(full_dataset %>%mutate(across(8:11, ~round(.x, 2))) %>% dplyr::select(1:13))```# $\mathrm{CV}_G$The coefficient of genetic variation is$$\mathrm{CV}_G = \frac{100\sqrt{\sigma^2_G}}{\overline{x}}$$ where $\sigma^2_G$ is the genetic variance in the trait of interest and $\overline{x}$ is the mean trait value. This metric allows comparison of genetic variances between traits expressed on different scales. We use it here to get a compare the extent genetic variation in life expectancy and lifespan equality.First, let's calculate the $\mathrm{CV}_G$ in life expectancy using individual-level data. While we're at it, we can also calculate broad-sense heritability. ```{r}# get conventional H^2 for lifespanif(!file.exists("data/Derived/heritability/conventional_e0.csv")){# Arya femalesArya_2010_1_Female_raw <- raw_data %>%filter(Treatment =="Arya_2010_1_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Arya_2010_1_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Arya_2010_1_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <-tibble(e0_heritability = Arya_2010_1_Female_H2_model$R[[1]],SE = Arya_2010_1_Female_H2_model$se[1,],Treatment =unique(Arya_2010_1_Female_raw$Treatment))# CVG Arya_2010_f_summ <-summary(Arya_2010_1_Female_H2_model$mod)CV_G <-tibble(V_G =rnorm(4000, mean =93.55, sd =9.672),mean_trait_value =rnorm(4000, mean =57.080, sd =0.776)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Arya_2010_1_Female_raw$Treatment))# Arya malesArya_2010_1_Male_raw <- raw_data %>%filter(Treatment =="Arya_2010_1_Male") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Arya_2010_1_Male_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Arya_2010_1_Male_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Arya_2010_1_Male_H2_model$R[[1]],SE = Arya_2010_1_Male_H2_model$se[1,],Treatment =unique(Arya_2010_1_Male_raw$Treatment)) )#CVG Arya_2010_m_summ <-summary(Arya_2010_1_Male_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =98.29, sd =9.914),mean_trait_value =rnorm(4000, mean =52.9947, sd =0.7926)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Arya_2010_1_Male_raw$Treatment)))# Huang 18C femalesHuang_2020_1_Female_raw <- raw_data %>%filter(Treatment =="Huang_2020_1_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Huang_2020_1_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Huang_2020_1_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Huang_2020_1_Female_H2_model$R[[1]],SE = Huang_2020_1_Female_H2_model$se[1,],Treatment =unique(Huang_2020_1_Female_raw$Treatment)) )#CVGHuang_2020_1_Female_summ <-summary(Huang_2020_1_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =423.5, sd =20.58),mean_trait_value =rnorm(4000, mean =79.143, sd =1.543)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Huang_2020_1_Female_raw$Treatment)) )# Huang 18C malesHuang_2020_1_Male_raw <- raw_data %>%filter(Treatment =="Huang_2020_1_Male") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Huang_2020_1_Male_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Huang_2020_1_Male_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Huang_2020_1_Male_H2_model$R[[1]],SE = Huang_2020_1_Male_H2_model$se[1,],Treatment =unique(Huang_2020_1_Male_raw$Treatment)) )#CVGHuang_2020_1_Male_summ <-summary(Huang_2020_1_Male_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =461.2, sd =21.48),mean_trait_value =rnorm(4000, mean =86.37, sd =1.61)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Huang_2020_1_Male_raw$Treatment)) )# Huang 25C femalesHuang_2020_2_Female_raw <- raw_data %>%filter(Treatment =="Huang_2020_2_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Huang_2020_2_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Huang_2020_2_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Huang_2020_2_Female_H2_model$R[[1]],SE = Huang_2020_2_Female_H2_model$se[1,],Treatment =unique(Huang_2020_2_Female_raw$Treatment)) )#CVGHuang_2020_2_Female_summ <-summary(Huang_2020_2_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =90.46, sd =9.511),mean_trait_value =rnorm(4000, mean =42.7445, sd =0.7069)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Huang_2020_2_Female_raw$Treatment)) )# Huang 25C malesHuang_2020_2_Male_raw <- raw_data %>%filter(Treatment =="Huang_2020_2_Male") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Huang_2020_2_Male_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Huang_2020_2_Male_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Huang_2020_2_Male_H2_model$R[[1]],SE = Huang_2020_2_Male_H2_model$se[1,],Treatment =unique(Huang_2020_2_Male_raw$Treatment)) )#CVGHuang_2020_2_Male_summ <-summary(Huang_2020_2_Male_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =105.3, sd =10.26),mean_trait_value =rnorm(4000, mean =45.2978, sd =0.7611)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Huang_2020_2_Male_raw$Treatment)) )# Huang 28C femalesHuang_2020_3_Female_raw <- raw_data %>%filter(Treatment =="Huang_2020_3_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Huang_2020_3_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Huang_2020_3_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Huang_2020_3_Female_H2_model$R[[1]],SE = Huang_2020_3_Female_H2_model$se[1,],Treatment =unique(Huang_2020_3_Female_raw$Treatment)) )#CVGHuang_2020_3_Female_summ <-summary(Huang_2020_3_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =41.47, sd =6.440),mean_trait_value =rnorm(4000, mean =28.207, sd =0.492)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Huang_2020_3_Female_raw$Treatment)) )# Huang 28C malesHuang_2020_3_Male_raw <- raw_data %>%filter(Treatment =="Huang_2020_3_Male") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Huang_2020_3_Male_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Huang_2020_3_Male_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Huang_2020_3_Male_H2_model$R[[1]],SE = Huang_2020_3_Male_H2_model$se[1,],Treatment =unique(Huang_2020_3_Male_raw$Treatment)) )#CVGHuang_2020_3_Male_summ <-summary(Huang_2020_3_Male_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =43.91, sd =6.627),mean_trait_value =rnorm(4000, mean =27.8709, sd =0.5054)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Huang_2020_3_Male_raw$Treatment)) )# Wilson females 1Wilson_2020_1_Female_raw <- raw_data %>%filter(Treatment =="Wilson_2020_1_Female")Wilson_2020_1_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Wilson_2020_1_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Wilson_2020_1_Female_H2_model$R[[1]],SE = Wilson_2020_1_Female_H2_model$se[1,],Treatment =unique(Wilson_2020_1_Female_raw$Treatment)) )#CVGWilson_2020_1_Female_summ <-summary(Wilson_2020_1_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =97.04, sd =9.851),mean_trait_value =rnorm(4000, mean =40.5357, sd =0.7809)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Wilson_2020_1_Female_raw$Treatment)) )# Wilson females 2Wilson_2020_2_Female_raw <- raw_data %>%filter(Treatment =="Wilson_2020_2_Female")Wilson_2020_2_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Wilson_2020_2_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Wilson_2020_2_Female_H2_model$R[[1]],SE = Wilson_2020_2_Female_H2_model$se[1,],Treatment =unique(Wilson_2020_2_Female_raw$Treatment)) )#CVGWilson_2020_2_Female_summ <-summary(Wilson_2020_2_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =69.82, sd =8.356),mean_trait_value =rnorm(4000, mean =32.2761, sd =0.6621)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Wilson_2020_2_Female_raw$Treatment)) )# Durham femalesDurham_2014_1_Female_raw <- raw_data %>%filter(Treatment =="Durham_2014_1_Female")Durham_2014_1_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Durham_2014_1_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Durham_2014_1_Female_H2_model$R[[1]],SE = Durham_2014_1_Female_H2_model$se[1,],Treatment =unique(Durham_2014_1_Female_raw$Treatment)) )#CVGDurham_2014_1_Female_summ <-summary(Durham_2014_1_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =76.95, sd =8.772),mean_trait_value =rnorm(4000, mean =36.1214, sd =0.6892)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Durham_2014_1_Female_raw$Treatment)) )# Patel femalesPatel_2021_1_Female_raw <- raw_data %>%filter(Treatment =="Patel_2021_1_Female")Patel_2021_1_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Patel_2021_1_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Patel_2021_1_Female_H2_model$R[[1]],SE = Patel_2021_1_Female_H2_model$se[1,],Treatment =unique(Patel_2021_1_Female_raw$Treatment)) )#CVGPatel_2021_1_Female_summ <-summary(Patel_2021_1_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =145.1, sd =12.05),mean_trait_value =rnorm(4000, mean =33.0666, sd =0.8901)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Patel_2021_1_Female_raw$Treatment)) )# Dick 1 femalesDick_2011_1_Female_raw <- raw_data %>%filter(Treatment =="Dick_2011_1_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Dick_2011_1_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Dick_2011_1_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Dick_2011_1_Female_H2_model$R[[1]],SE = Dick_2011_1_Female_H2_model$se[1,],Treatment =unique(Dick_2011_1_Female_raw$Treatment)) )#CVGDick_2011_1_Female_summ <-summary(Dick_2011_1_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =34.08, sd =5.837),mean_trait_value =rnorm(4000, mean =29.854, sd =1.019)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Dick_2011_1_Female_raw$Treatment)) )# Dick 1 malesDick_2011_1_Male_raw <- raw_data %>%filter(Treatment =="Dick_2011_1_Male") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Dick_2011_1_Male_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Dick_2011_1_Male_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Dick_2011_1_Male_H2_model$R[[1]],SE = Dick_2011_1_Male_H2_model$se[1,],Treatment =unique(Dick_2011_1_Male_raw$Treatment)) )#CVGDick_2011_1_Male_summ <-summary(Dick_2011_1_Male_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =41.15, sd =6.415),mean_trait_value =rnorm(4000, mean =27.834, sd =1.113)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Dick_2011_1_Male_raw$Treatment)) )# Dick 2 femalesDick_2011_2_Female_raw <- raw_data %>%filter(Treatment =="Dick_2011_2_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Dick_2011_2_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Dick_2011_2_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Dick_2011_2_Female_H2_model$R[[1]],SE = Dick_2011_2_Female_H2_model$se[1,],Treatment =unique(Dick_2011_2_Female_raw$Treatment)) )#CVGDick_2011_2_Female_summ <-summary(Dick_2011_2_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =26.47, sd =5.145),mean_trait_value =rnorm(4000, mean =23.1871, sd =0.8972)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Dick_2011_2_Female_raw$Treatment)) )# Dick 2 malesDick_2011_2_Male_raw <- raw_data %>%filter(Treatment =="Dick_2011_2_Male") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Dick_2011_2_Male_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Dick_2011_2_Male_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Dick_2011_2_Male_H2_model$R[[1]],SE = Dick_2011_2_Male_H2_model$se[1,],Treatment =unique(Dick_2011_2_Male_raw$Treatment)) )#CVGDick_2011_2_Male_summ <-summary(Dick_2011_2_Male_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =23.33, sd =4.831),mean_trait_value =rnorm(4000, mean =19.9551, sd =0.8401)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Dick_2011_2_Male_raw$Treatment)) )# Dick 3 femalesDick_2011_3_Female_raw <- raw_data %>%filter(Treatment =="Dick_2011_3_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Dick_2011_3_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Dick_2011_3_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Dick_2011_3_Female_H2_model$R[[1]],SE = Dick_2011_3_Female_H2_model$se[1,],Treatment =unique(Dick_2011_3_Female_raw$Treatment)) )#CVGDick_2011_3_Female_summ <-summary(Dick_2011_3_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rexp(4000, rate =1/5.039), # note the use of rexp instead of rnorm to avoid neg variance valuesmean_trait_value =rnorm(4000, mean =30.4241, sd =0.8665)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Dick_2011_3_Female_raw$Treatment)) )# Dick 3 malesDick_2011_3_Male_raw <- raw_data %>%filter(Treatment =="Dick_2011_3_Male") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Dick_2011_3_Male_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Dick_2011_3_Male_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Dick_2011_3_Male_H2_model$R[[1]],SE = Dick_2011_3_Male_H2_model$se[1,],Treatment =unique(Dick_2011_3_Male_raw$Treatment)) )#CVGDick_2011_3_Male_summ <-summary(Dick_2011_3_Male_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =40.45, sd =6.360),mean_trait_value =rnorm(4000, mean =27.75, sd =2.15)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Dick_2011_3_Male_raw$Treatment)) )# Hoffman 1 femalesHoffman_2021_1_Female_raw <- raw_data %>%filter(Treatment =="Hoffman_2021_1_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Hoffman_2021_1_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Hoffman_2021_1_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Hoffman_2021_1_Female_H2_model$R[[1]],SE = Hoffman_2021_1_Female_H2_model$se[1,],Treatment =unique(Hoffman_2021_1_Female_raw$Treatment)) )#CVGHoffman_2021_1_Female_summ <-summary(Hoffman_2021_1_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =162.1, sd =12.73),mean_trait_value =rnorm(4000, mean =53.241, sd =3.208)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Hoffman_2021_1_Female_raw$Treatment)) )# Hoffman 1 malesHoffman_2021_1_Male_raw <- raw_data %>%filter(Treatment =="Hoffman_2021_1_Male") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Hoffman_2021_1_Male_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Hoffman_2021_1_Male_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Hoffman_2021_1_Male_H2_model$R[[1]],SE = Hoffman_2021_1_Male_H2_model$se[1,],Treatment =unique(Hoffman_2021_1_Male_raw$Treatment)) )#CVGHoffman_2021_1_Male_summ <-summary(Hoffman_2021_1_Male_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =195.8, sd =13.99),mean_trait_value =rnorm(4000, mean =52.761, sd =3.516)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Hoffman_2021_1_Male_raw$Treatment)) )# Hoffman 2 femalesHoffman_2021_2_Female_raw <- raw_data %>%filter(Treatment =="Hoffman_2021_2_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Hoffman_2021_2_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Hoffman_2021_2_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Hoffman_2021_2_Female_H2_model$R[[1]],SE = Hoffman_2021_2_Female_H2_model$se[1,],Treatment =unique(Hoffman_2021_2_Female_raw$Treatment)) )#CVGHoffman_2021_2_Female_summ <-summary(Hoffman_2021_2_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =223.9, sd =14.96),mean_trait_value =rnorm(4000, mean =55.492, sd =4.333)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Hoffman_2021_2_Female_raw$Treatment)) )# Hoffman 2 malesHoffman_2021_2_Male_raw <- raw_data %>%filter(Treatment =="Hoffman_2021_2_Male") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Hoffman_2021_2_Male_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Hoffman_2021_2_Male_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Hoffman_2021_2_Male_H2_model$R[[1]],SE = Hoffman_2021_2_Male_H2_model$se[1,],Treatment =unique(Hoffman_2021_2_Male_raw$Treatment)) )#CVGHoffman_2021_2_Male_summ <-summary(Hoffman_2021_2_Male_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =233.8, sd =15.29),mean_trait_value =rnorm(4000, mean =59.240, sd =4.428)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Hoffman_2021_2_Male_raw$Treatment)) )# Zhao 1 femalesZhao_2022_1_Female_raw <- raw_data %>%filter(Treatment =="Zhao_2022_1_Female") #%>% #mutate(Vial_ID = as.factor(rleid(Vial_ID)))Zhao_2022_1_Female_H2_model <-rpt(Lifespan ~ (1|line), grname =c("line"), data = Zhao_2022_1_Female_raw, datatype ="Gaussian", nboot =1000, npermut =0)conventional_H2 <- conventional_H2 %>%bind_rows(tibble(e0_heritability = Zhao_2022_1_Female_H2_model$R[[1]],SE = Zhao_2022_1_Female_H2_model$se[1,],Treatment =unique(Zhao_2022_1_Female_raw$Treatment)) )#CVGZhao_2022_1_Female_summ <-summary(Zhao_2022_1_Female_H2_model$mod)CV_G <- CV_G %>%bind_rows(tibble(V_G =rnorm(4000, mean =200.51, sd =14.160),mean_trait_value =rnorm(4000, mean =58.362, sd =3.172)) %>%mutate(CV_G =100*sqrt(V_G) / mean_trait_value) %>% dplyr::select(V_G, CV_G) %>%summarise_draws(mean, ~quantile(.x, probs =c(0.025, 0.975))) %>%mutate(Treatment =unique(Zhao_2022_1_Female_raw$Treatment)) )write_csv(conventional_H2, "data/Derived/heritability/conventional_e0.csv")write_csv(CV_G, "data/Derived/heritability/conventional_CVG.csv")} else { conventional_H2 <-read_delim("data/Derived/heritability/conventional_e0.csv") CV_G <-read_delim("data/Derived/heritability/conventional_CVG.csv")}```We can also calculate $\mathrm{CV}_G$ directly from line (genotype) means. ```{r}CVG_data <- full_dataset %>%unite(Treatment, c("Treatment", "Sex"), sep ="_") %>%filter(Genotyped =="YES")# make a function to update the model and the posterior sample output with the desired traitCVG_e0_calculator <-function(selected_treatment){ data <- CVG_data %>%filter(Treatment == selected_treatment) model <-update(e0_VG_model, newdata = data) posterior <-as_draws_df(model) %>% dplyr::select(b_Intercept, sigma) %>%mutate(VG = sigma^2,CVG =100*sqrt(VG) /abs(b_Intercept)) %>%# Houle 1992mutate(Trait ="e0",Treatment = selected_treatment) posterior}CVG_h_calculator <-function(selected_treatment){ data <- CVG_data %>%filter(Treatment == selected_treatment) %>%filter(!is.na(SE_h)) model <-update(h_VG_model, newdata = data) posterior <-as_draws_df(model) %>% dplyr::select(b_Intercept, sigma) %>%mutate(VG = sigma^2,CVG =100*sqrt(VG) /abs(b_Intercept)) %>%# Houle 1992mutate(Trait ="h",Treatment = selected_treatment) posterior}treatment_list <-unique(CVG_data$Treatment)# Run the modelsRun_function <-FALSE# Change this to TRUE to run the modelsif(Run_function){d <- CVG_data %>%filter(Treatment =="Arya_2010_1_Female")e0_VG_model <-brm(data = d,family =gaussian(), e0 |mi(SE_e0) ~1,chains =4, cores =4, seed =1, iter =6000, warmup =2000)h_VG_model <-brm(data = d,family =gaussian(), h |mi(SE_h) ~1,chains =4, cores =4, seed =1, iter =6000, warmup =2000) CVG_data_e0 <-map_dfr(treatment_list, CVG_e0_calculator)CVG_data_h <-map_dfr(treatment_list, CVG_h_calculator)CVG_data <-bind_rows(CVG_data_e0, CVG_data_h) CVG_data %>%write_csv("data/Derived/heritability/CVG_data.csv")} else { CVG_data <-read_csv("data/Derived/heritability/CVG_data.csv")}CVG_summarised <- CVG_data %>%group_by(Trait, Treatment) %>%summarise_draws(mean, sd, ~quantile(.x, probs =c(0.025, 0.975))) %>%ungroup() %>%mutate(across(4:7, ~round(.x, 1))) %>%pivot_wider(names_from ="variable", values_from =4:7)```How do the results of the two methods compare?```{r}CVG_comparison <- CV_G %>%filter(variable =="CV_G") %>%rename(mean_CVG_conventional = mean,`2.5%_CVG conventional`=`2.5%`,`97.5%_CVG conventional`=`97.5%`) %>% dplyr::select(-variable) %>%left_join( CVG_summarised %>%filter(Trait =="e0") %>%rename(line_mean_CVG = mean_CVG,`line 2.5%_CVG`=`2.5%_CVG`,`line 97.5%_CVG`=`97.5%_CVG`) %>% dplyr::select(Treatment, line_mean_CVG, `line 2.5%_CVG`, `line 97.5%_CVG`) )CVG_comparison %>%ggplot(aes(x = mean_CVG_conventional, y = line_mean_CVG)) +geom_abline(intercept =0, slope =1, linetype =2) +geom_point(size =2.5) +scale_x_continuous(limits =c(0, 40), expand =c(0, 0)) +scale_y_continuous(limits =c(0, 40), expand =c(0, 0)) +labs(x ="Life expectancy CVG estimated from individual data",y ="Life expectancy CVG estimated from line mean data") +theme_bw() +theme(text =element_text(size =12))```$\mathrm{CV}_G$ in life expectancy calculated from individual level data is $\approx$ $\mathrm{CV}_G$ calculated from line means. We therefore treat these line mean estimates as reasonable and use them to compare life expectancy and lifespan equality.**Table SX**. $CV_G$ estimates for life expectancy and lifespan equality, estimated from genotype means.```{r}CVG_summarised %>% dplyr::select(Trait, Treatment, mean_CVG, `2.5%_CVG`, `97.5%_CVG`) %>%pivot_wider(names_from = Trait, values_from =3:5) %>% dplyr::select(Treatment, mean_CVG_e0, `2.5%_CVG_e0`, `97.5%_CVG_e0`, mean_CVG_h,`2.5%_CVG_h`, `97.5%_CVG_h`) %>%rename(`Life expectancy CVG`= mean_CVG_e0,`Life expectancy 2.5%`=`2.5%_CVG_e0`,`Life expectancy 97.5%`=`97.5%_CVG_e0`,`Lifespan equality CVG`= mean_CVG_h,`Lifespan equality 2.5%`=`2.5%_CVG_h`,`Lifespan equality 97.5%`=`97.5%_CVG_h`) %>%kable() %>%kable_styling()```# Preparing for univariate GWASThe preparation of data for univariate GWAS generally follows [Holman and Wong's (2023)](https://academic.oup.com/evolut/article/77/12/2642/7279223) DGRP GWAS of fitness in different contexts. See their associated `workflowr`[report](https://lukeholman.github.io/fitnessGWAS/index.html) for a comprehensive breakdown of their data preparation.## Loading data used in GWA testsFor GWAS and later CPASSOC, we split the data by study, removed studies that phenotyped \< 100 lines and adjusted line means to account for experimental block where applicable. Importantly, we also split the Wilson et al (2020) data by dietary treatment; while we do not explicitly consider diet in our analysis, lifespan in one dietary treatment is considered a separate trait from lifespan measured in a second dietary treatment.```{r}Arya_2010_f <- full_dataset %>%filter(Study =="Arya_2010"& Sex =="Female"& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Arya_2010_m <- full_dataset %>%filter(Study =="Arya_2010"& Sex =="Male"& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_f_18 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Female"& Temperature ==18& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_m_18 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Male"& Temperature ==18& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_f_25 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Female"& Temperature ==25& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_m_25 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Male"& Temperature ==25& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_f_28 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Female"& Temperature ==28& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Huang_2020_m_28 <- full_dataset %>%filter(Study =="Huang_2020"& Sex =="Male"& Temperature ==28& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)# In this study, some lines were measured twice per treatment, and a small subset were measured three times. We take the mean across blocks as the line mean, following the original study.Wilson_2020_f_1 <- full_dataset %>%filter(Treatment =="Wilson_2020_1"& Genotyped =="YES") %>%group_by(line) %>%mutate(e0 =mean(e0),h =mean(h)) %>%ungroup() %>%distinct(line, .keep_all =TRUE) %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Wilson_2020_f_2 <- full_dataset %>%filter(Treatment =="Wilson_2020_2"& Genotyped =="YES") %>%group_by(line) %>%mutate(e0 =mean(e0),h =mean(h)) %>%ungroup() %>%distinct(line, .keep_all =TRUE) %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)# In this study, each line was measured three times. We take the mean across blocks as the line meanDurham_2014_f <- full_dataset %>%filter(Study =="Durham_2014"& Genotyped =="YES") %>%group_by(line) %>%mutate(e0 =mean(e0),h =mean(h)) %>%ungroup() %>%distinct(line, .keep_all =TRUE) %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)Patel_2021_f <- full_dataset %>%filter(Study =="Patel_2021"& Genotyped =="YES") %>%mutate(e0_scaled =scale(e0),h_scaled =scale(h)) %>% dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)```## Install neccessary software and build helper functionsIn addition to the `R` packages we load, `plink 1.9` and `beagle` must also be installed. These software packages allow us to wrangle the data into the correct format and impute missing values, both of which are required to run GWAS with the `plink`.`plink` is run from the terminal, but we pass the terminal command to `R` first, which then writes to the terminal. This makes our analysis reproducible. However, Windows and mac operating systems liase with the terminal differently, meaning different functions are required depending on your operating system. To make this easy we include the following code chunk, where you can specify whether you're a windows or mac user.```{r}#| echo: trueOperating_system <-"mac"# change this to "windows" if appropriate. Note that all downstream functions are informed by this``````{r}# build functions to prepare data for GWASprep_for_e0_GWAS <-function(data, sex){data %>%mutate(line =glue("line{line}")) %>% dplyr::select(line, e0)}prep_for_h_GWAS <-function(data, sex){data %>%mutate(line =glue("line{line}")) %>% dplyr::select(line, h)}prep_for_ageing_GWAS <-function(data){ data %>%mutate(line =glue("line{line}")) %>% dplyr::select(line, ageing_axis_centered)}prep_for_baseline_mortality_GWAS <-function(data){ data %>%mutate(line =glue("line{line}")) %>% dplyr::select(line, baseline_mortality_axis_centered)}# I used bigsnpr::download_plink(dir = "code/windows") which puts it in the code folder - I'm using a windows operating system. The macOS version can also be downloaded into "code/macOS" # Beagle is a software package for phasing genotypes and imputing ungenotyped markers.if(Operating_system =="mac"){plink <-paste(getwd(), "code/macOS/plink", sep ="/")}if(Operating_system =="windows"){plink <-paste(getwd(), "code/windows/plink", sep ="/")}# only need to download this once - change path depending on operating system#beagle <- bigsnpr::download_beagle(# dir = "/Users/tkeaney/Library/CloudStorage/OneDrive-JGU/Postdoc/DGRP_lifespan/DGRP_lifespan_inequality/code/macOS") # helper function to pass commands to the terminal# Note that we set `intern = TRUE`, and pass the result of `system()` to `cat()`, # ensuring that the Terminal output will be printed in this knitr report.# # This is the mac OS functionif(Operating_system =="mac"){ run_command_mac <-function(shell_command, wd =getwd(), path =""){cat(system(glue("cd ", wd, path, # tell terminal which directory to work in"\n",shell_command), # on a second terminal line, run the plink commandintern =TRUE), sep ='\n') }}# This is the windows function if(Operating_system =="windows"){ run_command_windows <-function(plink_command, wd =getwd(), path ="") {# Specify the full path to the plink executable within the 'code' subdirectory. plink_path <-paste(getwd(), "code/windows/plink", sep ="/")# Create the full shell command with the plink executable. command <-glue("cmd.exe /c cd /d {shQuote(file.path(wd, path))} && {shQuote(plink_path)} {plink_command}")# Execute the combined command. result <-system(command, intern =TRUE)# Print the result.cat(result, sep ='\n')# Return the result as a character vector.return(result) }# sometimes we need to run terminal commands without plink - for windows, create a slightly different function to do this run_command_no_plink <-function(shell_command, wd =getwd(), path ="") {# Create the full shell command with the plink executable. command <-glue("cmd.exe /c cd /d {shQuote(file.path(wd, path))} && {shell_command}")# Execute the combined command. result <-system(command, intern =TRUE)# Print the result.cat(result, sep ='\n')# Return the result as a character vector.return(result) }}```## Perform SNP quality control and impute missing dataPlink recognises three types of files that are necessary for GWA analysis: the `.bed`, `.bim` and `.fam` files.`.bed`: the binary biallelic genotype table. Four options are possible:- 00 = homozygous for minor allele- 01 = missing genotype- 10 = heterozygous genotype- 11 = homozygous for major alleleThe overwhelming majority of genotypes in the DGRP are homozygous for one of the alleles (i.e. 00 or 11).`.bim`: extended variant information file accompanying the `.bed` file. It has six fields:1. chromosome code2. variant identifier3. position in morgans4. base-pair coordinate5. Minor allele6. Major allele`.fam`: Plink sample information file. It can have the following elements:1. Family ID ('FID') (in our case just the DGRP line)2. Within-family ID ('IID'; cannot be '0') (in our case just the DGRP line)3. Within-family ID of father ('0' if father isn't in dataset)4. Within-family ID of mother ('0' if mother isn't in dataset)5. Sex code ('1' = male, '2' = female, '0' = unknown) - not important for us because we analyse the sexes separately.6. Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data): -9 for us because we supply the phenotypic data later.We cleaned up the DGRP’s .bed/.bim/.fam files (available from the Mackay lab [website](http://dgrp2.gnets.ncsu.edu/)) by:1. Removing any SNPs for which genotypes are missing in \>10% of the 205 DGRP lines. We then use the software `Beagle` to impute the remaining missing genotypes. Imputation takes about half an hour, so ideally you only want to do it once.2. Removing SNPs with a minor allele frequency of less than 5% across the 205 lines. We have negligible power to detect associations for rare SNPs that occur at frequencies below this threshold.In the plink-formatted genotype files, lines fixed for the major allele are coded as 2, and lines fixed for the minor allele as 0. SNPs with negative $\beta$ coefficients therefore indicate that the minor allele is associated with higher trait values, while positive effect sizes means that the minor allele is associated with lower trait values.```{r}Run_function <-FALSE# Change this to TRUE to run - read through the code before you do this if(Run_function){# Use Plink to clean and subset the DGRP's SNP data as follows:# Only keep SNPs for which at least 90% of the 205 DGRP lines were successfully genotyped (--geno 0.1)# Only keep SNPs with a minor allele frequency of 0.05 or higher (--maf 0.05) across the 205 lines# Write the processed BIM/BED/FAM files to the data/Derived/plink_output directory output_directory <-paste(getwd(), "data/Derived/plink_output", sep ="/")if(Operating_system =="windows"){run_command_windows(glue("--bfile dgrp2"," --geno 0.1 --maf 0.05 --allow-no-sex", " --make-bed --out {shQuote(output_directory)}/dgrp2_QC_all_lines"), path ="data/Input/bfiles/") }if(Operating_system =="mac"){run_command_mac(glue("{plink} --bfile dgrp2"," --geno 0.1 --maf 0.05 --allow-no-sex", " --make-bed --out ../dgrp2_QC_all_lines"), path ="/data/input/bfiles/") }# Use the shell command 'sed' to remove underscores from the DGRP line names in the .fam file (e.g. 'line_120' becomes 'line120')# Otherwise, these underscores cause trouble when we need to convert from PLINK to vcf format (vcf format uses underscore as a separator)if(Operating_system =="windows"){ for(i in1:2) run_command_no_plink("sed -i '' 's/_//' dgrp2_QC_all_lines.fam", path ="/data/Derived/") }if(Operating_system =="mac"){ for(i in1:2) run_command_mac("sed -i '' 's/_//' dgrp2_QC_all_lines.fam", path ="/data/Derived/") }# Now impute the missing genotypes using Beagle# This part uses the data for the full DGRP panel of >200 lines, to infer missing genotypes as accurately as possible. # The bigsnpr package provides a helpful wrapper for Beagle called snp_beagleImpute(): it translates to a VCF file and back again using PLINK# Imputation with the below optimisation took about 25 mins on my computer, which is a high spec macbook by 2025 standardssnp_beagleImpute(beagle, plink, bedfile.in ="data/Derived/plink_output/dgrp2_QC_all_lines.bed", bedfile.out ="data/Derived/plink_output/dgrp2_QC_all_lines_imputed.bed",ncores =10, memory.max =32)# assign a sex of 'female' to all the DGRP lines (Beagle removes the sex, and it seems PLINK needs individuals to have a sex, # despite PLINK having a command called --allow-no-sex)if(Operating_system =="windows"){ run_command_windows("sed -i '' 's/ 0 0 0/ 0 0 2/' dgrp2_QC_all_lines_imputed.fam", path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac("sed -i '' 's/ 0 0 0/ 0 0 2/' dgrp2_QC_all_lines_imputed.fam", path ="/data/Derived/plink_output/") }# Re-write the .bed file, to make sure the MAF threshold and minor/major allele designations are correctly assigned post-Beagleif(Operating_system =="windows"){ run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed"," --geno 0.1 --maf 0.05 --allow-no-sex", " --make-bed --out dgrp2_QC_all_lines_imputed_correct"), path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed"," --geno 0.1 --maf 0.05 --allow-no-sex", " --make-bed --out dgrp2_QC_all_lines_imputed_correct"), path ="/data/Derived/plink_output/") }#unlink(list.files("data/derived", pattern = "~", full.names = TRUE)) # delete the original files, which were given a ~ file name by PLINK}```## Get minor allele frequencies in the DGRP```{r}# Use PLINK to get the allele IDs and calculate the MAFs across the whole DGRP, for all SNPs that survived QC# The file created is called data/derived/plink.frqif(!file.exists("data/Derived/plink_output/plink.frq")){if(Operating_system =="windows"){ run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct"," --freq"), path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct"," --freq"), path ="/data/Derived/plink_output/") }}# Extract and save the MAFs, SNP positions, and major/minor allelesMAFs <-read.table("data/Derived/plink_output/plink.frq", header =TRUE, stringsAsFactors =FALSE) %>%mutate(position =map_chr(strsplit(SNP, split ="_"), function(x) x[2])) %>% dplyr::select(SNP, position, MAF, A1, A2) %>%rename(minor_allele = A1,major_allele = A2) %>%as_tibble()```## Create a reduced list of LD-pruned SNPs with PLINK1,646,615 variants passed the MAF and missingness quality control. However, proximity causes strong linkage disequilibrium, such that neighbouring SNPs tend to have similar associations with the trait under consideration in GWAS. Separate genomic regions can be identified by pruning the number of SNPs within a genomic region using the `plink` arguments `--indep-pairwise 100 10 0.2`, which prunes SNPs within 100kB sliding windows, sliding 10 variants along with each step, and allows a maximum pairwise correlation ($r^2$) threshold of 0.2 between loci within the window. With these parameters, 1,419,773 variants were removed, leaving 226,842.```{r}# indep-pairwise arguments are: # 100kB window size, # variant count to shift the window by 10 variants at the end of each step, # pairwise r^2 threshold of 0.2if(!file.exists("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.out")) {if(Operating_system =="windows"){ run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct"," --indep-pairwise 100 10 0.2"), path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct"," --indep-pairwise 100 10 0.2"), path ="/data/Derived/plink_output/") }}Genomic_regions <-read.table("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.in") %>%rename(SNP = V1)```## Build GWAS functionWe use the `--assoc` flag to run a basic linear regression, fit with two data points: the mean phenotype for individuals homozygous for the minor allele and the mean phenotype for individuals homozygous for the major allele. The effect size ($\beta$) is the slope of this regression line. Negative effect sizes indicate that the minor allele is associated with a higher trait value than the major allele. The test statistic `T` and p-value are produced by a Wald test.```{r}run_GWAS <-function(phenotypes){# Make a list of the lines in our sample and save as a text file for passing to PLINK lines_to_keep <- phenotypes %>% dplyr::select(line) %>%mutate(line_2 = line)write.table(lines_to_keep, row.names =FALSE, col.names =FALSE, file ="data/Derived/plink_output/lines_to_keep.txt", quote =FALSE)# Now cull the PLINK files to just the lines that we measured, and re-apply the # MAF cut-off of 0.05 for the new smaller sample of DGRP linesif(Operating_system =="windows"){ run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct"," --keep-allele-order", # force PLINK to retain the major/minor allele designations that apply to the DGRP as a whole" --keep lines_to_keep.txt --geno 0.1 --maf 0.05", " --make-bed --out dgrp2_QC_focal_lines"), path ="/data/Derived/plink_output/") }if(Operating_system =="mac"){ run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct"," --keep-allele-order", # force PLINK to retain the major/minor allele designations that apply to the DGRP as a whole" --keep lines_to_keep.txt --geno 0.1 --maf 0.05", " --make-bed --out dgrp2_QC_focal_lines"), path ="/data/Derived/plink_output/") }# Define a function to add our phenotype data to a .fam file, which is needed for GWAS analysis and to make sure PLINK includes these samples# The 'phenotypes' data frame needs to have a column called 'line' add_phenotypes_to_fam <-function(filepath, phenotypes){read_delim(filepath, col_names =FALSE, delim =" ") %>% dplyr::select(X1, X2, X3, X4, X5) %>%# Get all the non-phenotype columnsleft_join(phenotypes, by =c("X1"="line")) %>%write.table(file ="data/Derived/plink_output/dgrp2_QC_focal_lines_NEW.fam", col.names =FALSE, row.names =FALSE, quote =FALSE, sep =" ")unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.fam")file.rename("data/Derived/plink_output/dgrp2_QC_focal_lines_NEW.fam", "data/Derived/plink_output/dgrp2_QC_focal_lines.fam") }# edit the new FAM file to add the phenotype data from 'phenotypes'add_phenotypes_to_fam("data/Derived/plink_output/dgrp2_QC_focal_lines.fam", phenotypes)# Run GWAS if(Operating_system =="windows"){ run_command_windows("--bfile dgrp2_QC_focal_lines --assoc --maf 0.05 --allow-no-sex", path ="/data/Derived/plink_output") }if(Operating_system =="mac"){ run_command_mac("{plink} --bfile dgrp2_QC_focal_lines --assoc --maf 0.05 --allow-no-sex", path ="/data/Derived/plink_output") }# wrangle the GWAS results Focal_name <-deparse(substitute(phenotypes)) gwas_results <-read.table("data/Derived/plink_output/plink.qassoc", header =TRUE) %>% dplyr::select(SNP, BETA, SE, "T", P)# Rename and compress the GWAS summary stats file # The filter step means that only variants in the LD-pruned subset get saved to disk. gwas_results %>%# filter(SNP %in% (pull(read_tsv("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.in", col_names = "SNP"), SNP))) %>% write_tsv(glue("data/Derived/GWAS_results/{Focal_name}.tsv.gz"))unlink("data/Derived/plink_output/plink.qassoc")# Rename the plink log filefile.rename("data/Derived/plink_output/plink.log",glue("data/Derived/plink_output/{Focal_name}_log.txt"))unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.bim")unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.bed")unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.fam")unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.log")} ```## Build manhattan plot function```{r}build_manhattan_plot <-function(gwas_results){ manhattan_data <- gwas_results %>%mutate(position =str_split(SNP, "_"), # split the SNP name into logical bitschr =map_chr(position, ~ .x[1]), # the first bit is the chromosome arm - name the column appropriatelyposition =as.numeric(map_chr(position, ~ .x[2])), # where on the chromosome is the SNP foundpval =-1*log10(P)) %>%# make visualising the p values easier dplyr::select(chr, position, pval) %>%filter(chr !="4")# this next chunk finds convenient cuts for labels and colour changes max_pos <- manhattan_data %>%group_by(chr) %>%summarise(max_pos =max(position), middle_pos = (max_pos -min(position)) /2,.groups ="drop") %>%as.data.frame() max_pos$max_pos <-c(0, cumsum(max_pos$max_pos[1:4])) max_pos$label_pos <- max_pos$max_pos + max_pos$middle_pos# combine the two dataframes created above manhattan_data <- manhattan_data %>%left_join(max_pos, by ="chr") %>%mutate(position = position + max_pos,chromosome_colour =case_when(chr =="2L"| chr =="3L"| chr =="X"~"A",.default ="B"),Notable =if_else(pval >=-log10(1e-08), "YES", "NO")) plot <- manhattan_data %>%filter(Notable =="NO") %>%ggplot(aes(position, pval, group = chr, stroke =0.01)) +geom_point(aes(colour = chromosome_colour), size =1.8, alpha =1) +geom_point(data =manhattan_data %>%filter(Notable =="YES"),aes(fill = chromosome_colour), colour ="black", shape =21, size =3.5, alpha =1) +geom_hline(yintercept =-log10(1e-08), linetype =2, colour ="#33271e", linewidth =1, alpha =0.8) +#geom_hline(yintercept = -log10(1e-05), linetype = 2, colour = "#33271e", linewidth = 1, alpha = 0.8) +scale_colour_manual(values =c(met.brewer(name ="Hokusai3")[3], met.brewer(name ="Hokusai3")[6])) +scale_fill_manual(values =c(met.brewer(name ="Hokusai3")[3], met.brewer(name ="Hokusai3")[6])) +scale_x_continuous(breaks = max_pos$label_pos, labels = max_pos$chr) +ylab("-log~10~(_p_)") +xlab("Chromosome and position") +theme_classic() +theme(legend.position ="none",axis.title.y =element_markdown(size =14),axis.title.x =element_markdown(size =14),axis.text.x =element_text(size =12),axis.text.y =element_text(size =12)) }```# Preparing for cross phenotype meta-analysisThe power to detect variants associated with genetically correlated phenotypes can be increased if a meta-analytic approach is adopted ([Zhu et al. 2018](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0193256)). Here, we used the cross-phenotype association (hereafter `CPASSOC`) approach developed by [Zhou et al. (2015)](https://www.sciencedirect.com/science/article/pii/S0002929714004777?via%3Dihub), which evaluates the null hypothesis that SNPs are not associated with any of the considered traits, weighted by the sample size of each study and adjusted for the trait genetic correlation matrix. In less language, CPASSOC evaluates the aggregated evidence of an association between a SNP and multiple phenotypes. The steps to apply `CPASSOC` are as follows:1. Estimate $R$, the trait correlation matrix. In the DGRP, this can be done using SNP data (pruned to minimise the effect of linkage disequilibrium) or using line means.2. GWAS each trait separately (see above).3. Collate effect sizes for each trait into a vector $\mathbf{\beta}$ for each SNP.4. Use a Wald test to estimate a test statistic $T_{ijk}$ for the $i^{th}$ SNP, $j^{th}$ cohort and $k^{th}$ treatment condition:$$T_{ijk} = \frac{\hat\beta_{ijk}}{\hat{s}_{ijk}}$$ , where $\hat\beta_{ijk}$ and $\hat{s}_{ijk}$ are the estimated coefficient and standard error for the $i^{th}$ SNP in the $j^{th}$, for the $k^{th}$ treatment condition. From individual test statistics, a vector holding test statistics for all traits ($T$) can be built.5. Test whether $\mathbf{\beta} = \mathbf{0}$. If the trait data are homogeneous (SNPs are expected to affect all traits in the same direction and at the same magnitude), use:$$S_{Hom} = \frac{e^T(RW)^{-1}T(e^T(RW)^{-1}T)^T}{e^T(WRW)^{-1}e}$$where $W$ is a diagonal matrix of weights for the individual test statistics (either the inverse of the variance or in our case the square root of the sample size of the $j^{th}$ cohort: $\sqrt{n_j}$).6. If there is heterogeneity between trait measures (i.e. it is a reasonable expectation that genetic variants could affect some traits in one direction and others in the opposing direction), $S_{Hom}$ is not appropriate. The ideal test statistic in this case is agnostic to the sign of a genetic variant's phenoypic effect and includes only the cohorts and traits with a true contribution to the association of a genetic variant. To implement this, the absolute value $\tau$ is used as a threshold, below which traits are not included in the test statistic. To allow for for effects of different signs in different environmental contexts, let $w_{ijk} = \sqrt{n_j}\times \mathrm{sign}(T_{ijk})$. To calculate this heterogenous summary statistic first find,$$S_{(\tau)} = \frac{e^T(R(\tau)W(\tau))^{-1}T(\tau)(e^T(R(\tau)W(\tau))^{-1}T(\tau))^T}{e^TW(\tau)^{-1}R(\tau)^{-1}W(\tau)^{-1}e}$$This statistic, $S_{Het}$ can be viewed as the maximum of the weighted sum of trait-specific test statistics that satisfy the $\tau$ threshold. When $\tau$ is large, $S(\tau)$ can be undefined if the test statistic falls below $\tau$ in all contexts and cohorts. In this case $S(\tau) = 0$. Our test statistic is then$$\displaystyle S_{Het} = \max_{\tau \gt 0} S(\tau)$$ Note that the restriction imposed by $\tau$ and the sign specific $w_{ijk}$ are the only differences between $S_{Het}$ and $S_{Hom}$.The inclusion of $\tau$ might give the impression of 'cherry picking'. However, the value of $S_{Het}$ lies in increasing power relative to univariate GWAS not by assessing if all tests return a strong, concordant association, but by identifying that more than one phenotype is strongly associated with a SNP. The more phenotypes associated, the larger $S_{Het}$ becomes.To generate *p*-values, $S_{Het}$ is compared to a gamma distribution with a mean shift of test-statistics (see [Zhou et al. (2015)](https://www.sciencedirect.com/science/article/pii/S0002929714004777?via%3Dihub)).Code to implement both statistics in `R` can be downloaded [here](http://hal.case.edu/~xxz10/zhu-web/), and is called below.## The functionsThese are directly loaded from [here](http://hal.case.edu/~xxz10/zhu-web/)```{r}require(compiler)enableJIT(4)Non_Trucated_TestScore <-function(X, SampleSize, CorrMatrix){ Wi =matrix(SampleSize, nrow =1); sumW =sqrt(sum(Wi^2)); W = Wi / sumW; Sigma =ginv(CorrMatrix); XX =apply(X, 1, function(x) { x1 <-matrix(x, ncol =length(x), nrow =1); T = W %*% Sigma %*%t(x1); T = (T*T) / (W %*% Sigma %*%t(W));return(T[1,1]); } );return(XX);}SHom <-cmpfun(Non_Trucated_TestScore);Trucated_TestScore <-function(X, SampleSize, CorrMatrix, correct =1, startCutoff =0, endCutoff =1, CutoffStep =0.05, isAllpossible = T){ N =dim(X)[2]; Wi =matrix(SampleSize, nrow =1); sumW =sqrt(sum(Wi^2)); W = Wi / sumW; XX =apply(X, 1, function(x) { TTT =-1;if (isAllpossible == T ) { cutoff =sort(unique(abs(x))); ## it will filter out any of them. } else { cutoff =seq(startCutoff, endCutoff, CutoffStep); }for (threshold in cutoff) { x1 = x; index =which(abs(x1) < threshold);if (length(index) == N) break; A = CorrMatrix; W1 = W;if (length(index) !=0 ) { x1 = x1[-index]; A = A[-index, -index]; ## update the matrix W1 = W[-index]; }if (correct ==1) { index =which(x1 <0);if (length(index) !=0) { W1[index] =-W1[index]; ## update the sign } } A =ginv(A); x1 =matrix(x1, nrow =1); W1 =matrix(W1, nrow =1); T = W1 %*% A %*%t(x1); T = (T*T) / (W1 %*% A %*%t(W1));if (TTT < T[1,1]) TTT = T[1,1]; }return(TTT); } );return(XX);}SHet <-cmpfun(Trucated_TestScore);EstimateGamma <-function (N =1E6, SampleSize, CorrMatrix, correct =1, startCutoff =0, endCutoff =1, CutoffStep =0.05, isAllpossible = T) { Wi =matrix(SampleSize, nrow =1); sumW =sqrt(sum(Wi^2)); W = Wi / sumW; Permutation =mvrnorm(n = N, mu =c(rep(0, length(SampleSize))), Sigma = CorrMatrix, tol =1e-8, empirical = F); Stat =Trucated_TestScore(X = Permutation, SampleSize = SampleSize, CorrMatrix = CorrMatrix,correct = correct, startCutoff = startCutoff, endCutoff = endCutoff,CutoffStep = CutoffStep, isAllpossible = isAllpossible); a =min(Stat)*3/4 ex3 =mean(Stat*Stat*Stat) V =var(Stat);for (i in1:100){ E =mean(Stat)-a; k = E^2/V theta = V/E a = (-3*k*(k+1)*theta**2+sqrt(9*k**2*(k+1)**2*theta**4-12*k*theta*(k*(k+1)*(k+2)*theta**3-ex3)))/6/k/theta } para =c(k,theta,a);return(para);}EmpDist <-function (N =1E6, SampleSize, CorrMatrix, correct =1, startCutoff =0, endCutoff =1, CutoffStep =0.05, isAllpossible = T) { Wi =matrix(SampleSize, nrow =1); sumW =sqrt(sum(Wi^2)); W = Wi / sumW; Permutation =mvrnorm(n = N, mu =c(rep(0, length(SampleSize))), Sigma = CorrMatrix, tol =1e-8, empirical = F); Stat =Trucated_TestScore(X = Permutation, SampleSize = SampleSize, CorrMatrix = CorrMatrix, correct = correct, startCutoff = startCutoff, endCutoff = endCutoff, CutoffStep = CutoffStep, isAllpossible = isAllpossible);return(Stat);}```# Analysing life expectancy and lifespan equality## Run univariate GWASRun GWAS for each environmental context and save the results.```{r}# prepare phenotype data for GWASArya_f_l <-prep_for_e0_GWAS(Arya_2010_f)Arya_m_l <-prep_for_e0_GWAS(Arya_2010_m)Arya_f_h <-prep_for_h_GWAS(Arya_2010_f)Arya_m_h <-prep_for_h_GWAS(Arya_2010_m)Huang_f_18_l <-prep_for_e0_GWAS(Huang_2020_f_18)Huang_f_18_h <-prep_for_h_GWAS(Huang_2020_f_18)Huang_m_18_l <-prep_for_e0_GWAS(Huang_2020_m_18)Huang_m_18_h <-prep_for_h_GWAS(Huang_2020_m_18)Huang_f_25_l <-prep_for_e0_GWAS(Huang_2020_f_25)Huang_f_25_h <-prep_for_h_GWAS(Huang_2020_f_25)Huang_m_25_l <-prep_for_e0_GWAS(Huang_2020_m_25)Huang_m_25_h <-prep_for_h_GWAS(Huang_2020_m_25)Huang_f_28_l <-prep_for_e0_GWAS(Huang_2020_f_28)Huang_f_28_h <-prep_for_h_GWAS(Huang_2020_f_28)Huang_m_28_l <-prep_for_e0_GWAS(Huang_2020_m_28)Huang_m_28_h <-prep_for_h_GWAS(Huang_2020_m_28)Wilson_f_l_1 <-prep_for_e0_GWAS(Wilson_2020_f_1)Wilson_f_h_1 <-prep_for_h_GWAS(Wilson_2020_f_1)Wilson_f_l_2 <-prep_for_e0_GWAS(Wilson_2020_f_2)Wilson_f_h_2 <-prep_for_h_GWAS(Wilson_2020_f_2)Durham_f_l <-prep_for_e0_GWAS(Durham_2014_f)Durham_f_h <-prep_for_h_GWAS(Durham_2014_f)Patel_f_l <-prep_for_e0_GWAS(Patel_2021_f)Patel_f_h <-prep_for_h_GWAS(Patel_2021_f)# if not already done, run the GWA testsif(!file.exists("data/Derived/GWAS_results/Arya_f_l.tsv.gz")) {run_GWAS(Arya_f_l)run_GWAS(Arya_m_l)run_GWAS(Arya_f_h)run_GWAS(Arya_m_h)run_GWAS(Huang_f_18_l)run_GWAS(Huang_f_18_h)run_GWAS(Huang_m_18_l)run_GWAS(Huang_m_18_h)run_GWAS(Huang_f_25_l)run_GWAS(Huang_f_25_h)run_GWAS(Huang_m_25_l)run_GWAS(Huang_m_25_h)run_GWAS(Huang_f_28_l)run_GWAS(Huang_f_28_h)run_GWAS(Huang_m_28_l)run_GWAS(Huang_m_28_h)run_GWAS(Wilson_f_l_1)run_GWAS(Wilson_f_h_1)run_GWAS(Wilson_f_l_2)run_GWAS(Wilson_f_h_2)run_GWAS(Durham_f_l)run_GWAS(Durham_f_h)run_GWAS(Patel_f_l)run_GWAS(Patel_f_h)}```Load the results```{r}# load GWAS results# Life expectancyArya_f_l_GWAS <-read_tsv("data/Derived/GWAS_results/Arya_f_l.tsv.gz") Huang_f_18_l_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_18_l.tsv.gz")Huang_f_25_l_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_25_l.tsv.gz") Huang_f_28_l_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_28_l.tsv.gz")Wilson_f_l_1_GWAS <-read_tsv("data/Derived/GWAS_results/Wilson_f_l_1.tsv.gz") Wilson_f_l_2_GWAS <-read_tsv("data/Derived/GWAS_results/Wilson_f_l_2.tsv.gz") Durham_f_l_GWAS <-read_tsv("data/Derived/GWAS_results/Durham_f_l.tsv.gz")Patel_f_l_GWAS <-read_tsv("data/Derived/GWAS_results/Patel_f_l.tsv.gz")Arya_m_l_GWAS <-read_tsv("data/Derived/GWAS_results/Arya_m_l.tsv.gz")Huang_m_18_l_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_18_l.tsv.gz")Huang_m_25_l_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_25_l.tsv.gz")Huang_m_28_l_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_28_l.tsv.gz")# Lifespan equalityArya_f_h_GWAS <-read_tsv("data/Derived/GWAS_results/Arya_f_h.tsv.gz")Huang_f_18_h_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_18_h.tsv.gz")Huang_f_25_h_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_25_h.tsv.gz") Huang_f_28_h_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_28_h.tsv.gz")Wilson_f_h_1_GWAS <-read_tsv("data/Derived/GWAS_results/Wilson_f_h_1.tsv.gz")Wilson_f_h_2_GWAS <-read_tsv("data/Derived/GWAS_results/Wilson_f_h_2.tsv.gz")Durham_f_h_GWAS <-read_tsv("data/Derived/GWAS_results/Durham_f_h.tsv.gz")Patel_f_h_GWAS <-read_tsv("data/Derived/GWAS_results/Patel_f_h.tsv.gz")Arya_m_h_GWAS <-read_tsv("data/Derived/GWAS_results/Arya_m_h.tsv.gz")Huang_m_18_h_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_18_h.tsv.gz")Huang_m_25_h_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_25_h.tsv.gz")Huang_m_28_h_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_28_h.tsv.gz")```As a point of comparison, we find the sum of significant associations detected by univariate GWAS**Table SX**. Genotype to phenotype associations detected by univariate GWAS, for **life expectancy**. The total row shows the number of unique candidate variants identified across all studies. \*Wilson et al phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis. The number of genomic regions indicates the number of assocations found after LD pruning.```{r}# filter down to sig associationse0_table <-bind_rows(tibble(`p < 1e-05 variants`=nrow(Arya_f_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Arya_f_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Arya_f_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Arya_f_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Arya et al 2010",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Virgin") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_18_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_18_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_18_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_18_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="18",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_25_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_25_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_25_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_25_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_28_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_28_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_28_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_28_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="28",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Wilson_f_l_1_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_l_1_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Wilson_f_l_1_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_l_1_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Wilson et al 2020",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Wilson_f_l_2_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_l_2_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Wilson_f_l_2_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_l_2_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Wilson et al 2020*",Treatment ="2",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Durham_f_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Durham_f_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Durham_f_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Durham_f_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Durham et al 2014",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Patel_f_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Patel_f_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Patel_f_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Patel_f_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Patel et al 2021",Treatment ="1",Sex ="Female",Temperature ="23",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Arya_m_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Arya_m_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Arya_m_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Arya_m_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Arya et al 2010",Treatment ="1",Sex ="Male",Temperature ="25",`Mating status`="Virgin") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_18_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_18_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_18_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_18_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="18",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_25_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_25_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_25_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_25_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_28_l_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_28_l_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_28_l_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_28_l_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="28",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`), ) # how many unique variants have been detected?p_05_SNPs_l <-bind_rows( Arya_f_l_GWAS %>%filter(P <1e-05), Arya_m_l_GWAS %>%filter(P <1e-05), Huang_f_18_l_GWAS %>%filter(P <1e-05), Huang_f_25_l_GWAS %>%filter(P <1e-05), Huang_f_28_l_GWAS %>%filter(P <1e-05), Huang_m_18_l_GWAS %>%filter(P <1e-05), Huang_m_25_l_GWAS %>%filter(P <1e-05), Huang_m_28_l_GWAS %>%filter(P <1e-05), Wilson_f_l_1_GWAS %>%filter(P <1e-05), Wilson_f_l_2_GWAS %>%filter(P <1e-05), Durham_f_l_GWAS %>%filter(P <1e-05), Patel_f_l_GWAS %>%filter(P <1e-05) ) %>%distinct(SNP) %>%left_join(Genomic_regions %>%mutate(Pruned_variant ="YES")) e0_table %>%add_row(Study ="Totals",Sex ="",Temperature ="",`p < 1e-05 variants`=nrow(p_05_SNPs_l),`p < 1e-05 genomic regions`=nrow(p_05_SNPs_l %>%filter(Pruned_variant =="YES")),`p < 1e-08 variants`=sum(e0_table$`p < 1e-08 variants`),`p < 1e-08 genomic regions`=sum(e0_table$`p < 1e-08 genomic regions`)) %>%kable() %>%kable_styling()```**Table SX**. Genotype to phenotype associations detected by univariate GWAS, for **lifespan equality**. The total row shows the number of unique candidate variants identified across all studies. \*Wilson et al phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis. The number of genomic regions indicates the number of assocations found after LD pruning.```{r}# filter down to sig associationsh_table <-bind_rows(tibble(`p < 1e-05 variants`=nrow(Arya_f_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Arya_f_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Arya_f_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Arya_f_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Arya et al 2010",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Virgin") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_18_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_18_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_18_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_18_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="18",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_25_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_25_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_25_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_25_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_28_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_28_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_28_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_28_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="28",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Wilson_f_h_1_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_h_1_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Wilson_f_h_1_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_h_1_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Wilson et al 2020",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Wilson_f_h_2_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_h_2_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Wilson_f_h_2_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_h_2_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Wilson et al 2020*",Treatment ="2",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Durham_f_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Durham_f_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Durham_f_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Durham_f_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Durham et al 2014",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Patel_f_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Patel_f_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Patel_f_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Patel_f_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Patel et al 2021",Treatment ="1",Sex ="Female",Temperature ="23",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Arya_m_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Arya_m_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Arya_m_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Arya_m_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Arya et al 2010",Treatment ="1",Sex ="Male",Temperature ="25",`Mating status`="Virgin") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_18_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_18_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_18_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_18_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="18",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_25_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_25_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_25_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_25_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_28_h_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_28_h_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_28_h_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_28_h_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="28",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`), ) # how many unique variants have been detected?p_05_SNPs_h <-bind_rows( Arya_f_h_GWAS %>%filter(P <1e-05), Arya_m_h_GWAS %>%filter(P <1e-05), Huang_f_18_h_GWAS %>%filter(P <1e-05), Huang_f_25_h_GWAS %>%filter(P <1e-05), Huang_f_28_h_GWAS %>%filter(P <1e-05), Huang_m_18_h_GWAS %>%filter(P <1e-05), Huang_m_25_h_GWAS %>%filter(P <1e-05), Huang_m_28_h_GWAS %>%filter(P <1e-05), Wilson_f_h_1_GWAS %>%filter(P <1e-05), Wilson_f_h_2_GWAS %>%filter(P <1e-05), Durham_f_h_GWAS %>%filter(P <1e-05), Patel_f_h_GWAS %>%filter(P <1e-05) ) %>%distinct(SNP) %>%left_join(Genomic_regions %>%mutate(Pruned_variant ="YES")) h_table %>%add_row(Study ="Totals",Sex ="",Temperature ="",`p < 1e-05 variants`=nrow(p_05_SNPs_h),`p < 1e-05 genomic regions`=nrow(p_05_SNPs_h %>%filter(Pruned_variant =="YES")),`p < 1e-08 variants`=sum(h_table$`p < 1e-08 variants`),`p < 1e-08 genomic regions`=sum(h_table$`p < 1e-08 genomic regions`)) %>%kable() %>%kable_styling()```## Applying cross-phenotype meta-analysis### Generate the genetic correlation matrixWe calculate the genetic correlations between traits using both the line mean and SNP effect size comparisons. Following Zhu et al. (2015), we use the SNP correlations for analysis.```{r}# use the BETA coefficients to build the SNP correlation matrixSNP_beta_e0 <-bind_rows( Arya_f_l_GWAS %>%mutate(Study ="Arya_2010", Sex ="Female", Temperature =25), Huang_f_18_l_GWAS %>%mutate(Study="Huang_2020", Sex="Female", Temperature=18), Huang_f_25_l_GWAS %>%mutate(Study="Huang_2020", Sex="Female", Temperature=25), Huang_f_28_l_GWAS %>%mutate(Study="Huang_2020", Sex="Female", Temperature=28), Wilson_f_l_1_GWAS %>%mutate(Study="Wilson_2020_1", Sex="Female", Temperature=25), Wilson_f_l_2_GWAS %>%mutate(Study="Wilson_2020_2", Sex="Female", Temperature=25), Durham_f_l_GWAS %>%mutate(Study="Durham_2014", Sex="Female", Temperature=25), Patel_f_l_GWAS %>%mutate(Study="Patel_2021", Sex="Female", Temperature=23), Arya_m_l_GWAS %>%mutate(Study="Arya_2010", Sex="Male", Temperature=25), Huang_m_18_l_GWAS %>%mutate(Study="Huang_2020", Sex="Male", Temperature=18), Huang_m_25_l_GWAS %>%mutate(Study="Huang_2020", Sex="Male", Temperature =25), Huang_m_28_l_GWAS %>%mutate(Study ="Huang_2020", Sex ="Male", Temperature =28)) %>% dplyr::select(SNP, BETA, Study, Sex, Temperature) %>%pivot_wider(values_from = BETA, names_from =c(Study, Sex, Temperature)) %>%rename(Arya_f_25 = Arya_2010_Female_25,Huang_f_18 = Huang_2020_Female_18,Huang_f_25 = Huang_2020_Female_25,Huang_f_28 = Huang_2020_Female_28,Wilson_f_25_1 = Wilson_2020_1_Female_25,Wilson_f_25_2 = Wilson_2020_2_Female_25,Durham_f_25 = Durham_2014_Female_25,Patel_f_23 = Patel_2021_Female_23,Arya_m_25 = Arya_2010_Male_25,Huang_m_18 = Huang_2020_Male_18,Huang_m_25 = Huang_2020_Male_25,Huang_m_28 = Huang_2020_Male_28)SNP_beta_e0_LD_pruned <- SNP_beta_e0 %>%inner_join(Genomic_regions)SNP_beta_h <-bind_rows( Arya_f_h_GWAS %>%mutate(Study ="Arya_2010", Sex ="Female", Temperature =25), Huang_f_18_h_GWAS %>%mutate(Study="Huang_2020", Sex="Female", Temperature=18), Huang_f_25_h_GWAS %>%mutate(Study="Huang_2020", Sex="Female", Temperature=25), Huang_f_28_h_GWAS %>%mutate(Study="Huang_2020", Sex="Female", Temperature=28), Wilson_f_h_1_GWAS %>%mutate(Study="Wilson_2020_1", Sex="Female", Temperature=25), Wilson_f_h_2_GWAS %>%mutate(Study="Wilson_2020_2", Sex="Female", Temperature=25), Durham_f_h_GWAS %>%mutate(Study="Durham_2014", Sex="Female", Temperature=25), Patel_f_h_GWAS %>%mutate(Study="Patel_2021", Sex="Female", Temperature=23), Arya_m_h_GWAS %>%mutate(Study="Arya_2010", Sex="Male", Temperature=25), Huang_m_18_h_GWAS %>%mutate(Study="Huang_2020", Sex="Male", Temperature=18), Huang_m_25_h_GWAS %>%mutate(Study="Huang_2020", Sex="Male", Temperature =25), Huang_m_28_h_GWAS %>%mutate(Study ="Huang_2020", Sex ="Male", Temperature =28)) %>% dplyr::select(SNP, BETA, Study, Sex, Temperature) %>%pivot_wider(values_from = BETA, names_from =c(Study, Sex, Temperature)) %>%rename(Arya_f_25 = Arya_2010_Female_25,Huang_f_18 = Huang_2020_Female_18,Huang_f_25 = Huang_2020_Female_25,Huang_f_28 = Huang_2020_Female_28,Wilson_f_25_1 = Wilson_2020_1_Female_25,Wilson_f_25_2 = Wilson_2020_2_Female_25,Durham_f_25 = Durham_2014_Female_25,Patel_f_23 = Patel_2021_Female_23,Arya_m_25 = Arya_2010_Male_25,Huang_m_18 = Huang_2020_Male_18,Huang_m_25 = Huang_2020_Male_25,Huang_m_28 = Huang_2020_Male_28)SNP_beta_h_LD_pruned <- SNP_beta_h %>%inner_join(Genomic_regions)SNP_e0_corr_matrix <-cor(SNP_beta_e0_LD_pruned %>% dplyr::select(-SNP), use ="pairwise.complete.obs", method ="spearman")SNP_h_corr_matrix <-cor(SNP_beta_h_LD_pruned %>% dplyr::select(-SNP), use ="pairwise.complete.obs", method ="spearman")line_data <-bind_rows(Arya_2010_f, Huang_2020_f_18, Huang_2020_f_25, Huang_2020_f_28, Wilson_2020_f_1, Wilson_2020_f_2, Durham_2014_f, Patel_2021_f, Arya_2010_m, Huang_2020_m_18, Huang_2020_m_25, Huang_2020_m_28) %>% dplyr::select(line, Treatment, Sex, Temperature, e0, h) %>%pivot_wider(values_from =c(e0, h), names_from =c(Treatment, Sex, Temperature)) line_data_e0 <- line_data %>% dplyr::select(contains("e0")) %>%rename(Arya_f_25 = e0_Arya_2010_1_Female_25,Huang_f_18 = e0_Huang_2020_1_Female_18,Huang_f_25 = e0_Huang_2020_2_Female_25,Huang_f_28 = e0_Huang_2020_3_Female_28,Wilson_f_25_1 = e0_Wilson_2020_1_Female_25,Wilson_f_25_2 = e0_Wilson_2020_2_Female_25,Durham_f_25 = e0_Durham_2014_1_Female_25,Patel_f_23 = e0_Patel_2021_1_Female_23,Arya_m_25 = e0_Arya_2010_1_Male_25,Huang_m_18 = e0_Huang_2020_1_Male_18,Huang_m_25 = e0_Huang_2020_2_Male_25,Huang_m_28 = e0_Huang_2020_3_Male_28)line_data_h <- line_data %>% dplyr::select(!contains("e0"), -line) %>%rename(Arya_f_25 = h_Arya_2010_1_Female_25,Huang_f_18 = h_Huang_2020_1_Female_18,Huang_f_25 = h_Huang_2020_2_Female_25,Huang_f_28 = h_Huang_2020_3_Female_28,Wilson_f_25_1 = h_Wilson_2020_1_Female_25,Wilson_f_25_2 = h_Wilson_2020_2_Female_25,Durham_f_25 = h_Durham_2014_1_Female_25,Patel_f_23 = h_Patel_2021_1_Female_23,Arya_m_25 = h_Arya_2010_1_Male_25,Huang_m_18 = h_Huang_2020_1_Male_18,Huang_m_25 = h_Huang_2020_2_Male_25,Huang_m_28 = h_Huang_2020_3_Male_28)line_e0_corr_matrix <-cor(line_data_e0, use ="pairwise.complete.obs", method ="spearman")line_h_corr_matrix <-cor(line_data_h, use ="pairwise.complete.obs", method ="spearman")```Let's visualise the genetic correlation between lifespan measures. First for life expectancy:```{r}breaksList <-seq(-1, 1, by =0.02)pheatmap(SNP_e0_corr_matrix, breaks = breaksList, main ="", legend_labels =c("-1", "-0.5", "0", "0.5", "Genetic correlation\n"),color =colorRampPalette(rev(met.brewer("Benedictus", direction =1)))(length(breaksList)),legend =TRUE, cutree_rows =3, cutree_cols =3, angle_col =45, border_color ="white")```Now for lifespan equality```{r}pheatmap(SNP_h_corr_matrix, breaks = breaksList, main ="", legend_labels =c("-1", "-0.5", "0", "0.5", "Genetic correlation\n"),color =colorRampPalette(rev(met.brewer("Benedictus", direction =1)))(length(breaksList)),legend =TRUE, cutree_rows =3, cutree_cols =3, angle_col =45, border_color ="white")```### Calculate meta-analytic test statisticsThe purpose of this meta-analysis is to search for SNPs that have some effect on life expectancy or lifespan equality, expressed in many different contexts (sexes, temperatures and mating status').To conduct CPASSOC for a given SNP, we need a $T$ statistic from each environmental context. A different number of lines were included in each GWAS, which caused small differences in the number of SNPs assessed in each cohort. We therefore prune the list of SNPs to those included in all univariate analyses. After pruning, 1,603,213 SNPs remain.The Bonferroni adjusted significance threshold for this number of tests is $p_{adj} = \frac{0.05}{1603213} = 3.12\times 10^{-8}$; here and for all future analysis, we use *p* $< 10^{-8}$ as our significance threshold, as this is slightly more conservative and easier to quickly interpret.#### Life expectancy```{r}Arya_f_l_T <- Arya_f_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_f = T)Huang_f_18_l_T <- Huang_f_18_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_18 = T)Huang_f_25_l_T <- Huang_f_25_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_25 = T)Huang_f_28_l_T <- Huang_f_28_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_28 = T)Wilson_f_l_1_T <- Wilson_f_l_1_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_25_1 = T)Wilson_f_l_2_T <- Wilson_f_l_2_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_25_2 = T)Durham_f_l_T <- Durham_f_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Durham_f_25 = T)Patel_f_l_T <- Patel_f_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Patel_f_23 = T)Arya_m_l_T <- Arya_m_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_m = T)Huang_m_18_l_T <- Huang_m_18_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_18 = T)Huang_m_25_l_T <- Huang_m_25_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_25 = T)Huang_m_28_l_T <- Huang_m_28_l_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_28 = T)all_e0_t_stats <- Arya_f_l_T %>%inner_join(Huang_f_18_l_T, by ="SNP") %>%inner_join(Huang_f_25_l_T, by ="SNP") %>%inner_join(Huang_f_28_l_T, by ="SNP") %>%inner_join(Wilson_f_l_1_T, by ="SNP") %>%inner_join(Wilson_f_l_2_T, by ="SNP") %>%inner_join(Durham_f_l_T, by ="SNP") %>%inner_join(Patel_f_l_T, by ="SNP") %>%inner_join(Arya_m_l_T, by ="SNP") %>%inner_join(Huang_m_18_l_T, by ="SNP") %>%inner_join(Huang_m_25_l_T, by ="SNP") %>%inner_join(Huang_m_28_l_T, by ="SNP")all_e0_t_stats_values <- all_e0_t_stats %>% dplyr::select(2:13)Sample_size_all <-c(165, 183, 186, 177, 161, 161, 176, 193, 165, 183, 186, 177) if(!file.exists("data/Derived/GWAS_results/all_e0_meta_results.csv")) {# run the homogeneous effect meta-analysisS_hom <-SHom(all_e0_t_stats_values, Sample_size_all, SNP_e0_corr_matrix)# calculate meta-p-values and bind the two together with the SNP namesp_S_hom <-pchisq(S_hom, df =1, ncp =0, lower.tail = F) %>%as_tibble() %>%bind_cols(S_hom) %>%rename(meta_p_hom = value, S_hom = ...2)# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary (potentially in sign) for different traits e.g. if a SNP has a sex or enviornment opposite effect on lifespan)# estimate parameters of gamma distributionpara <-EstimateGamma(N =1E4, Sample_size_all, SNP_e0_corr_matrix);S_het <-SHet(all_e0_t_stats_values, Sample_size_all, SNP_e0_corr_matrix)# obtain P-values of S_Het using the estimated gamma parametersp_S_het <-pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>%as_tibble() %>%bind_cols(S_het) %>%rename(meta_p_het = value, S_het = ...2)# bind meta statistics with the univariate effect sizesall_e0_meta_results <- all_e0_t_stats %>%bind_cols(p_S_hom, p_S_het) write_csv(all_e0_meta_results, "data/Derived/GWAS_results/all_e0_meta_results.csv")} else all_e0_meta_results <-read_csv("data/Derived/GWAS_results/all_e0_meta_results.csv")```#### Lifespan equality```{r}Arya_f_h_T <- Arya_f_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_f = T)Huang_f_18_h_T <- Huang_f_18_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_18 = T)Huang_f_25_h_T <- Huang_f_25_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_25 = T)Huang_f_28_h_T <- Huang_f_28_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_28 = T)Wilson_f_h_1_T <- Wilson_f_h_1_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_25_1 = T)Wilson_f_h_2_T <- Wilson_f_h_2_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_25_2 = T)Durham_f_h_T <- Durham_f_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Durham_f_25 = T)Patel_f_h_T <- Patel_f_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Patel_f_23 = T)Arya_m_h_T <- Arya_m_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_m = T)Huang_m_18_h_T <- Huang_m_18_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_18 = T)Huang_m_25_h_T <- Huang_m_25_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_25 = T)Huang_m_28_h_T <- Huang_m_28_h_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_28 = T)all_h_t_stats <- Arya_f_h_T %>%inner_join(Huang_f_18_h_T, by ="SNP") %>%inner_join(Huang_f_25_h_T, by ="SNP") %>%inner_join(Huang_f_28_h_T, by ="SNP") %>%inner_join(Wilson_f_h_1_T, by ="SNP") %>%inner_join(Wilson_f_h_2_T, by ="SNP") %>%inner_join(Durham_f_h_T, by ="SNP") %>%inner_join(Patel_f_h_T, by ="SNP") %>%inner_join(Arya_m_h_T, by ="SNP") %>%inner_join(Huang_m_18_h_T, by ="SNP") %>%inner_join(Huang_m_25_h_T, by ="SNP") %>%inner_join(Huang_m_28_h_T, by ="SNP") all_h_t_stats_values <- all_h_t_stats %>% dplyr::select(2:13)if(!file.exists("data/Derived/GWAS_results/all_h_meta_results.csv")) {S_hom <-SHom(all_h_t_stats_values, Sample_size_all, SNP_h_corr_matrix)# calculate meta-p-values and bind the two together with the SNP namesp_S_hom <-pchisq(S_hom, df =1, ncp =0, lower.tail = F) %>%as_tibble() %>%bind_cols(S_hom) %>%rename(meta_p_hom = value, S_hom = ...2)# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary (potentially in sign) for different traits e.g. if a SNP has a sex or enviornment opposite effect on lifespan)# estimate parameters of gamma distributionpara <-EstimateGamma(N =1E4, Sample_size_all, SNP_h_corr_matrix);S_het <-SHet(all_h_t_stats_values, Sample_size_all, SNP_h_corr_matrix)# obtain P-values of S_Het using the estimated gamma parametersp_S_het <-pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>%as_tibble() %>%bind_cols(S_het) %>%rename(meta_p_het = value, S_het = ...2)# bind meta statistics with the univariate effect sizesall_h_meta_results <- all_h_t_stats %>%bind_cols(p_S_hom, p_S_het)write_csv(all_h_meta_results, "data/Derived/GWAS_results/all_h_meta_results.csv")} else all_h_meta_results <-read_csv("data/Derived/GWAS_results/all_h_meta_results.csv")```## Visualise the resultsWe combine GWAS summary statistics calculated from lifespan data measured across different contexts. It's likely that some SNPs have G x E interactions that would lead to a heterogeneous effect across treatments. We therefore utilise the `S_het` calculated p-values.First lets show the effect of `CPASSOC` on the number of variants found to be associated with life expectancy and lifespan equality.**Table SX**. the number of variants associated with life expectancy and lifespan equality at various significance thresholds, estimated by univariate GWAS or CPASSOC.```{r}tibble(Analysis =c("CPASSOC", "Univariate", "CPASSOC", "Univariate"),Trait =c("Life expectancy", "Life expectancy", "Lifespan equality", "Lifespan equality"),`p < 1e-05 variants`=c(sum(all_e0_meta_results$meta_p_het <1e-05),nrow(p_05_SNPs_l),sum(all_h_meta_results$meta_p_het <1e-05),nrow(p_05_SNPs_h)),`p < 1e-05 genomic regions`=c(nrow(all_e0_meta_results %>%filter(meta_p_het <1e-05) %>%inner_join(Genomic_regions)),nrow(p_05_SNPs_l %>%filter(Pruned_variant =="YES")),nrow(all_h_meta_results %>%filter(meta_p_het <1e-05) %>%inner_join(Genomic_regions)),nrow(p_05_SNPs_h %>%filter(Pruned_variant =="YES"))),`p < 1e-08 variants`=c(sum(all_e0_meta_results$meta_p_het <1e-08),sum(e0_table$`p < 1e-08 variants`),sum(all_h_meta_results$meta_p_het <1e-08),sum(h_table$`p < 1e-08 variants`)),`p < 1e-08 genomic regions`=c(nrow(all_e0_meta_results %>%filter(meta_p_het <1e-08) %>%inner_join(Genomic_regions)),sum(h_table$`p < 1e-08 variants`),nrow(all_h_meta_results %>%filter(meta_p_het <1e-08) %>%inner_join(Genomic_regions)),sum(h_table$`p < 1e-08 variants`))) %>%kable() %>%kable_styling()```**Table SX**. genes that encompass or are very close to the genetic variants that have strong associations with life expectancy.```{r}# join gene annotations with the list of analysed variants # note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.life_expectancy_variants <- all_e0_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP, S_het, meta_p_het) %>%left_join(annotations %>%filter(distance.to.gene <=500)) %>%mutate(meta_p_het =signif(meta_p_het*10^18, 3)/10^18,S_het =round(S_het, 3)) %>% dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)life_expectancy_variants %>%my_data_table()```**Table SX**. genes that encompass or are very close to the genetic variants that have strong associations with lifespan equality.```{r}# join gene annotations with the list of analysed variants # note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.lifespan_equality_variants <- all_h_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP, S_het, meta_p_het) %>%left_join(annotations %>%filter(distance.to.gene <=500)) %>%mutate(meta_p_het =signif(meta_p_het*10^15, 3)/10^15,S_het =round(S_het, 3)) %>% dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)lifespan_equality_variants %>%my_data_table()```Now build some 'Manhattan plots' to show where these significant associations can be found:```{r, fig.width=11, eval=TRUE}#| column: pagee0_results <- all_e0_meta_results %>% inner_join(Genomic_regions) %>% dplyr::select(SNP, meta_p_hom, meta_p_het) %>% rename(P = meta_p_het) %>% mutate(logp = -log10(P))h_results <- all_h_meta_results %>% dplyr::select(SNP, meta_p_hom, meta_p_het) %>% inner_join(Genomic_regions) %>% dplyr::select(SNP, meta_p_hom, meta_p_het) %>% rename(P = meta_p_het) %>% mutate(logp = -log10(P))# plot the results using the manhattan plot custom function we defined earliere0_het_plot <- build_manhattan_plot(e0_results) + labs(title = "Life expectancy") + theme(plot.title = element_text(size = 20, hjust = 0.5)) + scale_y_continuous(limits = c(0, 21), expand = c(0, 0))h_het_plot <- build_manhattan_plot(h_results) + labs(title = "Lifespan equality") + theme(plot.title = element_text(size = 20, hjust = 0.5)) + scale_y_continuous(limits = c(0, 21), expand = c(0, 0))e0_het_plot + h_het_plot + plot_annotation(tag_levels = "A")```**Figure XX**. Manhattan plot showing the -Log~10~ *p*-value for each genomic region's effect on A) life expectancy and B) lifespan equality.Plot the univariate effect sizes for each of the regions associated with life expectancy / lifespan equality at the genome-wide significance threshold (p \< $0.05^{-8}$) after CPASSOC.**Life expectancy**```{r, fig.height=9}SNP_heatmap_e0 <- SNP_beta_e0 %>% inner_join( all_e0_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% inner_join(Genomic_regions))row_name <- SNP_heatmap_e0$SNPSNP_heatmap_e0 <- SNP_heatmap_e0 %>% dplyr::select(-SNP) %>% as.matrix()rownames(SNP_heatmap_e0) <- row_namebreaksList <- seq(-7, 7, by = 0.01)annotation_SNPs <- all_e0_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L", str_detect(SNP, "2R") ~ "2R", str_detect(SNP, "3L") ~ "3L", str_detect(SNP, "3R") ~ "3R", str_detect(SNP, "X") ~ "X"))annotation_studies <- tibble(Study = c("Arya_f_25", "Huang_f_18", "Huang_f_25", "Huang_f_28", "Wilson_f_25_1", "Wilson_f_25_2", "Durham_f_25", "Patel_f_23", "Arya_m_25", "Huang_m_18", "Huang_m_25", "Huang_m_28"), Temperature = c("25", "18", "25", "28", "25", "25", "25", "23", "25", "18", "25", "28")) %>% mutate(Sex = case_when(str_detect(Study, "_f") ~ "Female", .default = "Male"), Mating = case_when(str_detect(Study, "Arya") ~ "NO", str_detect(Study, "Huang") ~ "Throughout life", str_detect(Study, "Wilson") ~ "Early life", str_detect(Study, "Durham") ~ "Throughout life", str_detect(Study, "Patel") ~ "Early life"), Author = str_extract(Study, ".*(?=\\_)"), Author = str_remove(Author, "_f"), Author = str_remove(Author, "_m"))# create a study annotation column, need this to be a data.frame rather than a tibble for some reason Study_details <- annotation_studies %>% as.data.frame() %>% dplyr::select(Study, Temperature, Mating)my_categories <- data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2], Mating = Study_details[, 3])my_colors <- list(Temperature = c("18" = "#7bbcd5", # sailboat colours from pnw "23" = "#d0e2af", "25" = "#f5db99", "28" = "#e89c81"), Mating = c("NO" = "#f8e3d1", # Shuksan from pnw "Early life" = "#d7b1c5", "Throughout life" = "#ac8eab"), Chromosome = c("2L" = "#d8aedd", # lake colours from pnw "2R" = "#cb74ad", "3L" = "#11c2b5", "3R" = "#72e1e1", "X" = "#fbcc74"))# create a SNP annotation columnSNP_details <- annotation_SNPs %>% as.data.frame()my_SNP_categories <- data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])my_col_names <- c("Arya et al females", "Huang et al females", "Huang et al females", "Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females", "Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males", "Huang et al males")pheatmap(SNP_heatmap_e0, breaks = breaksList, main = "", color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)), legend = TRUE, cutree_rows = 6, cutree_cols = 5, angle_col = 45, border_color = "white", annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories, fontsize = 8, labels_col = my_col_names)```**Figure SX**. univariate effect sizes for each of the genomic regions associated with life expectancy at the genome-wide significance threshold (p \< $10^{-8}$) after CPASSOC. Effect sizes are expressed in days added to life expectancy per major allele copy. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis.**Lifespan equality**```{r}SNP_heatmap_h <- SNP_beta_h %>%inner_join( all_h_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP) %>%inner_join(Genomic_regions))row_name <- SNP_heatmap_h$SNPSNP_heatmap_h <- SNP_heatmap_h %>% dplyr::select(-SNP) %>%as.matrix()rownames(SNP_heatmap_h) = row_namebreaksList <-seq(-0.15, 0.15, by =0.001)annotation_SNPs_h <- all_h_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP) %>%mutate(Chromosome =case_when(str_detect(SNP, "2L") ~"2L",str_detect(SNP, "2R") ~"2R",str_detect(SNP, "3L") ~"3L",str_detect(SNP, "3R") ~"3R",str_detect(SNP, "X") ~"X"))# create a SNP annotation columnSNP_details_h <- annotation_SNPs_h %>%as.data.frame()my_SNP_categories_h <-data.frame(row.names = SNP_details_h[, 1], Chromosome = SNP_details_h[, 2])pheatmap(SNP_heatmap_h, breaks = breaksList, main ="",color =colorRampPalette(rev(met.brewer("Benedictus", direction =1)))(length(breaksList)),legend =TRUE, cutree_rows =3, cutree_cols =4, angle_col =45, border_color ="white",annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories_h,fontsize =8, labels_col = my_col_names)```**Figure XX**. univariate effect sizes for each of the genomic regions associated with lifespan equality at the genome-wide significance threshold (p \< $10^{-8}$) after CPASSOC. Effect sizes are expressed in **equality added** per major allele copy. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis.# Analysing the rate of ageing and baseline mortality## Axes of ageing rate and baseline mortalityWe've shown that orthogonal deviation from the regression of lifespan equality on life expectancy closely corresponds to the rate of ageing ($\beta$) parameter in a Gompertz-Makeham mortality model. To identify regions of the genome associated with the rate of ageing, we can calculate a rate of ageing index for each line in each treatment. To create this index, we rotate the coordinate system of the life expectancy and lifespan equality plane by $\theta$ degrees, where $\theta$ is the angle between the positive x-axis and the regression of lifespan equality on life expectancy.**Finding the slopes**```{r}# create a dataframe with which mean regression lines can be predicted from each model. It spans 4 SDs in either direction. nd <-tibble(e0 =seq(from =0, to =180, length.out =180))# fit the modelsArya_f_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Arya_2010_f, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Arya_f_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Arya_f_slope <-as_draws_df(Arya_f_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Arya_regression_line_f <-fitted(Arya_f_model,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Arya_m_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Arya_2010_m, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Arya_m_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Arya_m_slope <-as_draws_df(Arya_m_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Arya_regression_line_m <-fitted(Arya_m_model,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Huang_f_18_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Huang_2020_f_18, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Huang_f_18_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Huang_f_18_slope <-as_draws_df(Huang_f_18_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Huang_f_18_regression_line <-fitted(Huang_f_18_model, newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Huang_m_18_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Huang_2020_m_18, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Huang_m_18_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Huang_m_18_slope <-as_draws_df(Huang_m_18_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Huang_m_18_regression_line <-fitted(Huang_m_18_model,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Huang_f_25_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Huang_2020_f_25, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Huang_f_25_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Huang_f_25_slope <-as_draws_df(Huang_f_25_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Huang_f_25_regression_line <-fitted(Huang_f_25_model,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Huang_m_25_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Huang_2020_m_25, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Huang_m_25_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Huang_m_25_slope <-as_draws_df(Huang_m_25_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Huang_m_25_regression_line <-fitted(Huang_m_25_model,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Huang_f_28_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Huang_2020_f_28, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Huang_f_28_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Huang_f_28_slope <-as_draws_df(Huang_f_28_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Huang_f_28_regression_line <-fitted(Huang_f_28_model,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Huang_m_28_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Huang_2020_m_28, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Huang_m_28_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Huang_m_28_slope <-as_draws_df(Huang_m_28_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Huang_m_28_regression_line <-fitted(Huang_m_28_model,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Wilson_f_model_1 <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Wilson_2020_f_1, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Wilson_f_slope_1",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Wilson_f_slope_1 <-as_draws_df(Wilson_f_model_1) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Wilson_f_regression_line_1 <-fitted(Wilson_f_model_1,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Wilson_f_model_2 <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Wilson_2020_f_2, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Wilson_f_slope_2",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Wilson_f_slope_2 <-as_draws_df(Wilson_f_model_2) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Wilson_f_regression_line_2 <-fitted(Wilson_f_model_2,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Durham_f_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Durham_2014_f, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Durham_f_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Durham_f_slope <-as_draws_df(Durham_f_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Durham_f_regression_line <-fitted(Durham_f_model,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)Patel_f_model <-brm(h ~1+ e0,#prior = c(prior(normal(0, 0.1), class = Intercept),# prior(normal(0, 1), class = b),# prior(exponential(1), class = sigma)),family = gaussian,iter =6000, warmup =2000,control =list(adapt_delta =0.8, max_treedepth =10),data = Patel_2021_f, chains =4, cores =4, file ="data/Derived/Ageing_axis_slopes/Patel_f_slope",backend ="cmdstanr", stan_model_args =list(stanc_options =list("O1")),refresh =400, silent =0, seed =1)Patel_f_slope <-as_draws_df(Patel_f_model) %>%as_tibble() %>% dplyr::select(b_e0) %>%summarise(slope =mean(b_e0)) %>%pull(slope) Patel_regression_line <-fitted(Patel_f_model,newdata = nd) %>%data.frame() %>%bind_cols(nd) %>% dplyr::select(Estimate, e0)```With regression coefficients found, we use the following formula to calculate the angle (in radians) between the mean regression line and the x-axis:$\theta = tan^{-1}(\beta)$where $\beta$ is the point estimate of the slope from each model posterior distribution.```{r}Arya_f_angle <-atan(Arya_f_slope)Arya_m_angle <-atan(Arya_m_slope)Huang_f_18_angle <-atan(Huang_f_18_slope)Huang_m_18_angle <-atan(Huang_m_18_slope)Huang_f_25_angle <-atan(Huang_f_25_slope)Huang_m_25_angle <-atan(Huang_m_25_slope)Huang_f_28_angle <-atan(Huang_f_28_slope)Huang_m_28_angle <-atan(Huang_m_28_slope)Wilson_f_1_angle <-atan(Wilson_f_slope_1)Wilson_f_2_angle <-atan(Wilson_f_slope_2)Durham_f_angle <-atan(Durham_f_slope)Patel_f_angle <-atan(Patel_f_slope)```We then rotated the coordinate system of the life expectancy and lifespan equality plane clockwise by $\theta$:$$x' = -(x\cos(\theta) + y\sin(\theta))$$ $$y' = -(x\sin(\theta) - y\cos(\theta))$$where $x'$ and $y'$ are the vectors of genotype means for baseline mortality rate and ageing rate, and $x$ and $y$ are vectors of genotype means for life expectancy and lifespan equality. We perform this transformation on the unscaled data.```{r}Arya_2010_f <- Arya_2010_f %>%mutate(baseline_mortality_axis =-1*(e0*cos(Arya_f_angle) + h*sin(Arya_f_angle)),ageing_axis =-1*(e0*sin(Arya_f_angle) - h*cos(Arya_f_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Arya_2010_m <- Arya_2010_m %>%mutate(baseline_mortality_axis =-1*(e0*cos(Arya_m_angle) + h*sin(Arya_m_angle)),ageing_axis =-1*(e0*sin(Arya_m_angle) - h*cos(Arya_m_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Huang_2020_f_18 <- Huang_2020_f_18 %>%mutate(baseline_mortality_axis =-1*(e0*cos(Huang_f_18_angle) + h*sin(Huang_f_18_angle)),ageing_axis =-1*(e0*sin(Huang_f_18_angle) - h*cos(Huang_f_18_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Huang_2020_m_18 <- Huang_2020_m_18 %>%mutate(baseline_mortality_axis =-1*(e0*cos(Huang_m_18_angle) + h*sin(Huang_m_18_angle)),ageing_axis =-1*(e0*sin(Huang_m_18_angle) - h*cos(Huang_m_18_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Huang_2020_f_25 <- Huang_2020_f_25 %>%mutate(baseline_mortality_axis =-1*(e0*cos(Huang_f_25_angle) + h*sin(Huang_f_25_angle)),ageing_axis =-1*(e0*sin(Huang_f_25_angle) - h*cos(Huang_f_25_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Huang_2020_m_25 <- Huang_2020_m_25 %>%mutate(baseline_mortality_axis =-1*(e0*cos(Huang_m_25_angle) + h*sin(Huang_m_25_angle)),ageing_axis =-1*(e0*sin(Huang_m_25_angle) - h*cos(Huang_m_25_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Huang_2020_f_28 <- Huang_2020_f_28 %>%mutate(baseline_mortality_axis =-1*(e0*cos(Huang_f_28_angle) + h*sin(Huang_f_28_angle)),ageing_axis =-1*(e0*sin(Huang_f_28_angle) - h*cos(Huang_f_28_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Huang_2020_m_28 <- Huang_2020_m_28 %>%mutate(baseline_mortality_axis =-1*(e0*cos(Huang_m_28_angle) + h*sin(Huang_m_28_angle)),ageing_axis =-1*(e0*sin(Huang_m_28_angle) - h*cos(Huang_m_28_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Wilson_2020_f_1 <- Wilson_2020_f_1 %>%mutate(baseline_mortality_axis =-1*(e0*cos(Wilson_f_1_angle) + h*sin(Wilson_f_1_angle)),ageing_axis =-1*(e0*sin(Wilson_f_1_angle) - h*cos(Wilson_f_1_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Wilson_2020_f_2 <- Wilson_2020_f_2 %>%mutate(baseline_mortality_axis =-1*(e0*cos(Wilson_f_2_angle) + h*sin(Wilson_f_2_angle)),ageing_axis =-1*(e0*sin(Wilson_f_2_angle) - h*cos(Wilson_f_2_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Durham_2014_f <- Durham_2014_f %>%mutate(baseline_mortality_axis =-1*(e0*cos(Durham_f_angle) + h*sin(Durham_f_angle)),ageing_axis =-1*(e0*sin(Durham_f_angle) - h*cos(Durham_f_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))Patel_2021_f <- Patel_2021_f %>%mutate(baseline_mortality_axis =-1*(e0*cos(Patel_f_angle) + h*sin(Patel_f_angle)),ageing_axis =-1*(e0*sin(Patel_f_angle) - h*cos(Patel_f_angle)),baseline_mortality_axis_centered = baseline_mortality_axis -mean(baseline_mortality_axis),ageing_axis_centered = ageing_axis -mean(ageing_axis))```Finally, simulate curves from for the Gompertz-Makeham distribution to show the correlation between the $\alpha$ and $\beta$ parametrs and our baseline mortality and ageing rate proxies.```{r}# script to draw h~e0 for different gompertz b# a sequencea_seq <-seq(-30,2,0.02)# b sequenceb_seq <-seq(-5,-0.5,0.5)b_seq <-exp(b_seq)gomp_seq <-data.frame(b=NULL,e0=NULL,h=NULL)age_seq <-seq(0,10000,0.1)Run_sim <-FALSE# change to TRUE to run the simif(Run_sim){for(i in1:length(b_seq)){for (j in1:length(a_seq)){ lx <-exp(-exp(a_seq[j])/b_seq[i]*(exp(b_seq[i]*age_seq)-1)) lx <- lx[lx!=0]if(tail(lx,1)<0.1){ e0_gomp <-sum(lx)*0.1 disparity <--sum(lx*log(lx))*0.1 h_gomp <--log(disparity/e0_gomp) } gomp_seq <-rbind(gomp_seq,c(b_seq[i],e0_gomp,h_gomp)) } }write_csv(gomp_seq, "data/Derived/gompertz_simulation.csv")} else{ gomp_seq <-read_csv("data/Derived/gompertz_simulation.csv")}names(gomp_seq) <-c("b","e0","h")gomp_seq$b <-log(gomp_seq$b)gomp_seq$b <-as.factor(gomp_seq$b)```Plot the line means, coloured by their value on the ageing rate axis.```{r, fig.height=10, fig.width=12}rotated_axis_plot <- function(data, regression_data, which_axis, fill_title, study_title, limit){ data %>% ggplot(aes(x = e0, y = h)) + geom_line(data = gomp_seq, aes(x = e0, y = h, group = b), alpha = 0.4, linetype = 2) + geom_point(aes(fill = which_axis), shape = 21, size = 4) + scale_fill_moma_c("Avedon", direction = -1, limits = c(-1*limit, limit)) + geom_smooth(data = regression_data, aes(y = Estimate), stat = "identity", alpha = 1/2, linewidth = 1) + scale_x_continuous(limits = c(5, 145), breaks = c(0, 30, 60, 90, 120), expand = c(0, 0)) + scale_y_continuous(limits = c(0.1, 3.5), breaks = c(1, 2, 3), expand = c(0, 0)) + labs(fill = fill_title, x = "Life expectancy", y = "Lifespan equality", title = study_title) + theme_bw() + theme(plot.title = element_text(hjust = 0.5), panel.grid = element_blank(), axis.title.y = element_markdown(size = 12), axis.title.x = element_markdown(size = 12), axis.text.x = element_text(size = 10), axis.text.y = element_text(size = 10))}a <- rotated_axis_plot(Arya_2010_f, Arya_regression_line_f, which_axis = Arya_2010_f$ageing_axis_centered, "Ageing\nrate", "Arya 25C females", limit = 1.6)a.1 <- rotated_axis_plot(Arya_2010_f, Arya_regression_line_f, which_axis = Arya_2010_f$baseline_mortality_axis_centered, "Baseline\nmortality", "Arya 25C females", limit = 60)b <- rotated_axis_plot(Arya_2010_m, Arya_regression_line_m, which_axis = Arya_2010_m$ageing_axis_centered, "Ageing\nrate", "Arya 25C males", limit = 1.6) #+ #coord_cartesian(xlim = c(20, 80), ylim = c(0.7, 3.3))b.1 <- rotated_axis_plot(Arya_2010_m, Arya_regression_line_m, which_axis = Arya_2010_m$baseline_mortality_axis_centered, "Baseline\nmortality", "Arya 25C males", limit = 60) #+ #coord_cartesian(xlim = c(20, 80), ylim = c(0.7, 3.3))c <- rotated_axis_plot(Huang_2020_f_18, Huang_f_18_regression_line, which_axis = Huang_2020_f_18$ageing_axis_centered, "Ageing\nrate", "Huang 18C females", limit = 1.6) #+ #coord_cartesian(xlim = c(20, 135), ylim = c(0.4, 2.5))c.1 <- rotated_axis_plot(Huang_2020_f_18, Huang_f_18_regression_line, which_axis = Huang_2020_f_18$baseline_mortality_axis_centered, "Baseline\nmortality", "Huang 18C females", limit = 60) #+ #coord_cartesian(xlim = c(20, 135), ylim = c(0.4, 2.5))d <- rotated_axis_plot(Huang_2020_m_18, Huang_m_18_regression_line, which_axis = Huang_2020_m_18$ageing_axis_centered, "Ageing\nrate", "Huang 18C males", limit = 1.6) #+ #coord_cartesian(xlim = c(30, 140), ylim = c(0.3, 2.5))d.1 <- rotated_axis_plot(Huang_2020_m_18, Huang_m_18_regression_line, which_axis = Huang_2020_m_18$baseline_mortality_axis_centered, "Baseline\nmortality", "Huang 18C males", limit = 60) #+ #coord_cartesian(xlim = c(30, 140), ylim = c(0.3, 2.5))e <- rotated_axis_plot(Huang_2020_f_25, Huang_f_25_regression_line, which_axis = Huang_2020_f_25$ageing_axis_centered, "Ageing\nrate", "Huang 25C females", limit = 1.6) #+ #coord_cartesian(xlim = c(10, 70), ylim = c(0.4, 3))e.1 <- rotated_axis_plot(Huang_2020_f_25, Huang_f_25_regression_line, which_axis = Huang_2020_f_25$baseline_mortality_axis_centered, "Baseline\nmortality", "Huang 25C females", limit = 60) #+ #coord_cartesian(xlim = c(10, 70), ylim = c(0.4, 3))f <- rotated_axis_plot(Huang_2020_m_25, Huang_m_25_regression_line, which_axis = Huang_2020_m_25$ageing_axis_centered, "Ageing\nrate", "Huang 25C males", limit = 1.6) #+ #coord_cartesian(xlim = c(15, 80), ylim = c(0.5, 2.5))f.1 <- rotated_axis_plot(Huang_2020_m_25, Huang_m_25_regression_line, which_axis = Huang_2020_m_25$baseline_mortality_axis_centered, "Baseline\nmortality", "Huang 25C males", limit = 60) #+ #coord_cartesian(xlim = c(15, 80), ylim = c(0.5, 2.5))g <- rotated_axis_plot(Huang_2020_f_28, Huang_f_28_regression_line, which_axis = Huang_2020_f_28$ageing_axis_centered, "Ageing\nrate", "Huang 28C females", limit = 1.6) #+ #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))g.1 <- rotated_axis_plot(Huang_2020_f_28, Huang_f_28_regression_line, which_axis = Huang_2020_f_28$baseline_mortality_axis_centered, "Baseline\nmortality", "Huang 28C females", limit = 60) #+ #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))h <- rotated_axis_plot(Huang_2020_m_28, Huang_m_28_regression_line, which_axis = Huang_2020_m_28$ageing_axis_centered, "Ageing\nrate", "Huang 28C males", limit = 1.6) #+ #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))h.1 <- rotated_axis_plot(Huang_2020_m_28, Huang_m_28_regression_line, which_axis = Huang_2020_m_28$baseline_mortality_axis_centered, "Baseline\nmortality", "Huang 28C males", limit = 60) #+ #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))i <- rotated_axis_plot(Wilson_2020_f_1, Wilson_f_regression_line_1, which_axis = Wilson_2020_f_1$ageing_axis_centered, "Ageing\nrate", "Wilson 25C females 1", limit = 1.6) #+ #coord_cartesian(xlim = c(15, 75), ylim = c(0.4, 2.5))i.1 <- rotated_axis_plot(Wilson_2020_f_1, Wilson_f_regression_line_1, which_axis = Wilson_2020_f_1$baseline_mortality_axis_centered, "Baseline\nmortality", "Wilson 25C females 1", limit = 60) #+ #coord_cartesian(xlim = c(15, 75), ylim = c(0.4, 2.5))j <- rotated_axis_plot(Wilson_2020_f_2, Wilson_f_regression_line_2, which_axis = Wilson_2020_f_2$ageing_axis_centered, "Ageing\nrate", "Wilson 25C females 2", limit = 1.6) #+ #coord_cartesian(xlim = c(5, 55), ylim = c(0.1, 2.5))j.1 <- rotated_axis_plot(Wilson_2020_f_2, Wilson_f_regression_line_2, which_axis = Wilson_2020_f_2$baseline_mortality_axis_centered, "Baseline\nmortality", "Wilson 25C females 2", limit = 60) #+ #coord_cartesian(xlim = c(5, 55), ylim = c(0.1, 2.5))k <- rotated_axis_plot(Durham_2014_f, Durham_f_regression_line, which_axis = Durham_2014_f$ageing_axis_centered, "Ageing\nrate", "Durham 25C females", limit = 1.6) #+ #coord_cartesian(xlim = c(15, 65), ylim = c(1.1, 2.3))k.1 <- rotated_axis_plot(Durham_2014_f, Durham_f_regression_line, which_axis = Durham_2014_f$baseline_mortality_axis_centered, "Baseline\nmortality", "Durham 25C females", limit = 60) #+ #coord_cartesian(xlim = c(15, 65), ylim = c(1.1, 2.3))l <- rotated_axis_plot(Patel_2021_f, Patel_regression_line, which_axis = Patel_2021_f$ageing_axis_centered, "Ageing\nrate", "Patel 23C females", limit = 1.6) #+ #coord_cartesian(xlim = c(10, 75), ylim = c(0.1, 3.3))l.1 <- rotated_axis_plot(Patel_2021_f, Patel_regression_line, which_axis = Patel_2021_f$baseline_mortality_axis_centered, "Baseline\nmortality", "Patel 23C females", limit = 60) #+ #coord_cartesian(xlim = c(10, 75), ylim = c(0.1, 3.3))``````{r}(a | b | c) / (d | e | f) / (g | h| i) / (j | k | l) +#guide_area()) +plot_layout(guides ='collect')```**Figure SX**. Points show DGRP lines, shaded by their genotypic values for the rate of ageing. Dashed curves show simulation outcomes from a Gompertz-Makeham distribution: the rate of ageing differs between curves but is fixed within them, where the baseline mortality decreases as curves progress to the right. Note that colour shows the rate of ageing relative to the mean within the treatment.```{r}(a.1| b.1| c.1) / (d.1| e.1| f.1) / (g.1| h.1| i.1) / (j.1| k.1| l.1) +#guide_area()) +plot_layout(guides ='collect')```**Figure SX**. As per Figure SX, except colours indicate our proxy for the baseline rate of ageing.## Run univariate GWASConduct GWAS and save the results.```{r}Arya_f_ageing <-prep_for_ageing_GWAS(Arya_2010_f)Arya_m_ageing <-prep_for_ageing_GWAS(Arya_2010_m)Huang_f_18_ageing <-prep_for_ageing_GWAS(Huang_2020_f_18)Huang_m_18_ageing <-prep_for_ageing_GWAS(Huang_2020_m_18)Huang_f_25_ageing <-prep_for_ageing_GWAS(Huang_2020_f_25)Huang_m_25_ageing <-prep_for_ageing_GWAS(Huang_2020_m_25)Huang_f_28_ageing <-prep_for_ageing_GWAS(Huang_2020_f_28)Huang_m_28_ageing <-prep_for_ageing_GWAS(Huang_2020_m_28)Wilson_f_ageing_1 <-prep_for_ageing_GWAS(Wilson_2020_f_1)Wilson_f_ageing_2 <-prep_for_ageing_GWAS(Wilson_2020_f_2)Durham_f_ageing <-prep_for_ageing_GWAS(Durham_2014_f)Patel_f_ageing <-prep_for_ageing_GWAS(Patel_2021_f)Arya_f_baseline_mortality <-prep_for_baseline_mortality_GWAS(Arya_2010_f)Arya_m_baseline_mortality <-prep_for_baseline_mortality_GWAS(Arya_2010_m)Huang_f_18_baseline_mortality <-prep_for_baseline_mortality_GWAS(Huang_2020_f_18)Huang_m_18_baseline_mortality <-prep_for_baseline_mortality_GWAS(Huang_2020_m_18)Huang_f_25_baseline_mortality <-prep_for_baseline_mortality_GWAS(Huang_2020_f_25)Huang_m_25_baseline_mortality <-prep_for_baseline_mortality_GWAS(Huang_2020_m_25)Huang_f_28_baseline_mortality <-prep_for_baseline_mortality_GWAS(Huang_2020_f_28)Huang_m_28_baseline_mortality <-prep_for_baseline_mortality_GWAS(Huang_2020_m_28)Wilson_f_baseline_mortality_1 <-prep_for_baseline_mortality_GWAS(Wilson_2020_f_1)Wilson_f_baseline_mortality_2 <-prep_for_baseline_mortality_GWAS(Wilson_2020_f_2)Durham_f_baseline_mortality <-prep_for_baseline_mortality_GWAS(Durham_2014_f)Patel_f_baseline_mortality <-prep_for_baseline_mortality_GWAS(Patel_2021_f)if(!file.exists("data/Derived/GWAS_results/Arya_f_ageing.tsv.gz")) {run_GWAS(Arya_f_ageing)run_GWAS(Arya_m_ageing)run_GWAS(Huang_f_18_ageing)run_GWAS(Huang_m_18_ageing)run_GWAS(Huang_f_25_ageing)run_GWAS(Huang_m_25_ageing)run_GWAS(Huang_f_28_ageing)run_GWAS(Huang_m_28_ageing)run_GWAS(Wilson_f_ageing_1)run_GWAS(Wilson_f_ageing_2)run_GWAS(Durham_f_ageing)run_GWAS(Patel_f_ageing)run_GWAS(Arya_f_baseline_mortality)run_GWAS(Arya_m_baseline_mortality)run_GWAS(Huang_f_18_baseline_mortality)run_GWAS(Huang_m_18_baseline_mortality)run_GWAS(Huang_f_25_baseline_mortality)run_GWAS(Huang_m_25_baseline_mortality)run_GWAS(Huang_f_28_baseline_mortality)run_GWAS(Huang_m_28_baseline_mortality)run_GWAS(Wilson_f_baseline_mortality_1)run_GWAS(Wilson_f_baseline_mortality_2)run_GWAS(Durham_f_baseline_mortality)run_GWAS(Patel_f_baseline_mortality)}Arya_f_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Arya_f_ageing.tsv.gz") Arya_m_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Arya_m_ageing.tsv.gz") Huang_f_18_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_18_ageing.tsv.gz")Huang_m_18_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_18_ageing.tsv.gz")Huang_f_25_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_25_ageing.tsv.gz")Huang_m_25_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_25_ageing.tsv.gz")Huang_f_28_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_28_ageing.tsv.gz")Huang_m_28_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_28_ageing.tsv.gz")Wilson_f_ageing_1_GWAS <-read_tsv("data/Derived/GWAS_results/Wilson_f_ageing_1.tsv.gz")Wilson_f_ageing_2_GWAS <-read_tsv("data/Derived/GWAS_results/Wilson_f_ageing_2.tsv.gz")Durham_f_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Durham_f_ageing.tsv.gz")Patel_f_ageing_GWAS <-read_tsv("data/Derived/GWAS_results/Patel_f_ageing.tsv.gz")Arya_f_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Arya_f_baseline_mortality.tsv.gz") Arya_m_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Arya_m_baseline_mortality.tsv.gz") Huang_f_18_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_18_baseline_mortality.tsv.gz")Huang_m_18_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_18_baseline_mortality.tsv.gz")Huang_f_25_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_25_baseline_mortality.tsv.gz")Huang_m_25_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_25_baseline_mortality.tsv.gz")Huang_f_28_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_f_28_baseline_mortality.tsv.gz")Huang_m_28_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Huang_m_28_baseline_mortality.tsv.gz")Wilson_f_baseline_mortality_1_GWAS <-read_tsv("data/Derived/GWAS_results/Wilson_f_baseline_mortality_1.tsv.gz")Wilson_f_baseline_mortality_2_GWAS <-read_tsv("data/Derived/GWAS_results/Wilson_f_baseline_mortality_2.tsv.gz")Durham_f_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Durham_f_baseline_mortality.tsv.gz")Patel_f_baseline_mortality_GWAS <-read_tsv("data/Derived/GWAS_results/Patel_f_baseline_mortality.tsv.gz")```**Table SX**. Genotype to phenotype associations detected by univariate GWAS, for the **rate of ageing**. The number of genomic regions indicates the number of genetic variants associated with the rate of ageing after LD pruning. The total row shows the number of unique candidate variants identified across all studies. \*Wilson et al. phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis.```{r}# filter down to sig associationsageing_table <-bind_rows(tibble(`p < 1e-05 variants`=nrow(Arya_f_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Arya_f_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Arya_f_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Arya_f_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Arya et al 2010",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Virgin") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_18_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_18_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_18_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_18_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="18",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_25_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_25_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_25_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_25_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_28_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_28_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_28_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_28_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="28",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Wilson_f_ageing_1_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_ageing_1_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Wilson_f_ageing_1_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_ageing_1_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Wilson et al 2020",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Wilson_f_ageing_2_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_ageing_2_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Wilson_f_ageing_2_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_ageing_2_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Wilson et al 2020*",Treatment ="2",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Durham_f_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Durham_f_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Durham_f_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Durham_f_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Durham et al 2014",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Patel_f_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Patel_f_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Patel_f_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Patel_f_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Patel et al 2021",Treatment ="1",Sex ="Female",Temperature ="23",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Arya_m_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Arya_m_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Arya_m_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Arya_m_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Arya et al 2010",Treatment ="1",Sex ="Male",Temperature ="25",`Mating status`="Virgin") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_18_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_18_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_18_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_18_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="18",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_25_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_25_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_25_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_25_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_28_ageing_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_28_ageing_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_28_ageing_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_28_ageing_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="28",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`), ) # how many unique variants have been detected?ageing_p_05_SNPs <-bind_rows( Arya_f_ageing_GWAS %>%filter(P <1e-05), Arya_m_ageing_GWAS %>%filter(P <1e-05), Huang_f_18_ageing_GWAS %>%filter(P <1e-05), Huang_f_25_ageing_GWAS %>%filter(P <1e-05), Huang_f_28_ageing_GWAS %>%filter(P <1e-05), Huang_m_18_ageing_GWAS %>%filter(P <1e-05), Huang_m_25_ageing_GWAS %>%filter(P <1e-05), Huang_m_28_ageing_GWAS %>%filter(P <1e-05), Wilson_f_ageing_1_GWAS %>%filter(P <1e-05), Wilson_f_ageing_2_GWAS %>%filter(P <1e-05), Durham_f_ageing_GWAS %>%filter(P <1e-05), Patel_f_ageing_GWAS %>%filter(P <1e-05) ) %>%distinct(SNP) %>%left_join(Genomic_regions %>%mutate(Pruned_variant ="YES")) ageing_table %>%add_row(Study ="Totals",Sex ="",Temperature ="",`p < 1e-05 variants`=nrow(ageing_p_05_SNPs),`p < 1e-05 genomic regions`=nrow(ageing_p_05_SNPs %>%filter(Pruned_variant =="YES")),`p < 1e-08 variants`=sum(ageing_table$`p < 1e-08 variants`),`p < 1e-08 genomic regions`=sum(ageing_table$`p < 1e-08 genomic regions`)) %>%kable() %>%kable_styling()```**Table SX**. Genotype to phenotype associations detected by univariate GWAS, for **baseline mortality rate**. The number of genomic regions indicates the number of genetic variants associated with baseline mortality after LD pruning. The total row shows the number of unique candidate variants identified across all studies. \*Wilson et al. phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis.```{r}# filter down to sig associationsscaling_table <-bind_rows(tibble(`p < 1e-05 variants`=nrow(Arya_f_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Arya_f_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Arya_f_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Arya_f_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Arya et al 2010",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Virgin") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_18_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_18_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_18_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_18_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="18",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_25_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_25_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_25_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_25_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_f_28_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_28_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_f_28_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_f_28_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Female",Temperature ="28",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Wilson_f_baseline_mortality_1_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_baseline_mortality_1_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Wilson_f_baseline_mortality_1_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_baseline_mortality_1_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Wilson et al 2020",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Wilson_f_baseline_mortality_2_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_baseline_mortality_2_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Wilson_f_baseline_mortality_2_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Wilson_f_baseline_mortality_2_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Wilson et al 2020*",Treatment ="2",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Durham_f_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Durham_f_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Durham_f_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Durham_f_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Durham et al 2014",Treatment ="1",Sex ="Female",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Patel_f_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Patel_f_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Patel_f_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Patel_f_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Patel et al 2021",Treatment ="1",Sex ="Female",Temperature ="23",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Arya_m_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Arya_m_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Arya_m_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Arya_m_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Arya et al 2010",Treatment ="1",Sex ="Male",Temperature ="25",`Mating status`="Virgin") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_18_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_18_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_18_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_18_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="18",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_25_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_25_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_25_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_25_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="25",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`),tibble(`p < 1e-05 variants`=nrow(Huang_m_28_baseline_mortality_GWAS %>%filter(P <1e-05)),`p < 1e-05 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_28_baseline_mortality_GWAS %>%filter(P <1e-05))),`p < 1e-08 variants`=nrow(filter(Huang_m_28_baseline_mortality_GWAS, P <1e-08)),`p < 1e-08 genomic regions`=nrow(inner_join(Genomic_regions, Huang_m_28_baseline_mortality_GWAS %>%filter(P <1e-08)))) %>%mutate(Study ="Huang et al 2020",Treatment ="1",Sex ="Male",Temperature ="28",`Mating status`="Mated") %>% dplyr::select(Study, Sex, Temperature,`p < 1e-05 variants`, `p < 1e-05 genomic regions`, `p < 1e-08 variants`, `p < 1e-08 genomic regions`), ) # how many unique variants have been detected?scaling_p_05_SNPs <-bind_rows( Arya_f_baseline_mortality_GWAS %>%filter(P <1e-05), Arya_m_baseline_mortality_GWAS %>%filter(P <1e-05), Huang_f_18_baseline_mortality_GWAS %>%filter(P <1e-05), Huang_f_25_baseline_mortality_GWAS %>%filter(P <1e-05), Huang_f_28_baseline_mortality_GWAS %>%filter(P <1e-05), Huang_m_18_baseline_mortality_GWAS %>%filter(P <1e-05), Huang_m_25_baseline_mortality_GWAS %>%filter(P <1e-05), Huang_m_28_baseline_mortality_GWAS %>%filter(P <1e-05), Wilson_f_baseline_mortality_1_GWAS %>%filter(P <1e-05), Wilson_f_baseline_mortality_2_GWAS %>%filter(P <1e-05), Durham_f_baseline_mortality_GWAS %>%filter(P <1e-05), Patel_f_baseline_mortality_GWAS %>%filter(P <1e-05) ) %>%distinct(SNP) %>%left_join(Genomic_regions %>%mutate(Pruned_variant ="YES")) scaling_table %>%add_row(Study ="Totals",Sex ="",Temperature ="",`p < 1e-05 variants`=nrow(scaling_p_05_SNPs),`p < 1e-05 genomic regions`=nrow(scaling_p_05_SNPs %>%filter(Pruned_variant =="YES")),`p < 1e-08 variants`=sum(scaling_table$`p < 1e-08 variants`),`p < 1e-08 genomic regions`=sum(scaling_table$`p < 1e-08 genomic regions`)) %>%kable() %>%kable_styling()```## Applying cross-phenotype meta-analysis### Generate the genetic correlation matrixUsing SNP effect sizes, we calculate the genetic correlations between a) rates of ageing and b) baseline mortality, measured in different environmental contexts.```{r}# use the BETA coefficients to build the SNP correlation matrix for the rate of ageingSNP_ageing_axis_data <-bind_rows( Arya_f_ageing_GWAS %>%mutate(Study ="Arya_2010", Temperature =25, Sex ="Female"), Arya_m_ageing_GWAS %>%mutate(Study ="Arya_2010", Temperature =25, Sex ="Male"), Huang_f_18_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =18, Sex ="Female"), Huang_m_18_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =18, Sex ="Male"), Huang_f_25_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =25, Sex ="Female"), Huang_m_25_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =25, Sex ="Male"), Huang_f_28_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =28, Sex ="Female"), Huang_m_28_ageing_GWAS %>%mutate(Study ="Huang_2020", Temperature =28, Sex ="Male"), Wilson_f_ageing_1_GWAS %>%mutate(Study ="Wilson_2020_1", Temperature =25, Sex ="Female"), Wilson_f_ageing_2_GWAS %>%mutate(Study ="Wilson_2020_2", Temperature =25, Sex ="Female"), Durham_f_ageing_GWAS %>%mutate(Study ="Durham_2014", Temperature =25, Sex ="Female"), Patel_f_ageing_GWAS %>%mutate(Study ="Patel_2021", Temperature =23, Sex ="Female")) %>% dplyr::select(SNP, BETA, Study, Temperature, Sex) %>%pivot_wider(values_from = BETA, names_from =c(Study, Temperature, Sex)) SNP_ageing_axis_LD_pruned <- SNP_ageing_axis_data %>%inner_join(Genomic_regions)SNP_ageing_axis_corr_matrix <-cor(SNP_ageing_axis_LD_pruned %>% dplyr::select(-SNP), use ="pairwise.complete.obs", method ="spearman")# use the BETA coefficients to build the SNP correlation matrix for scalingSNP_baseline_mortality_axis_data <-bind_rows( Arya_f_baseline_mortality_GWAS %>%mutate(Study ="Arya_2010", Temperature =25, Sex ="Female"), Arya_m_baseline_mortality_GWAS %>%mutate(Study ="Arya_2010", Temperature =25, Sex ="Male"), Huang_f_18_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =18, Sex ="Female"), Huang_m_18_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =18, Sex ="Male"), Huang_f_25_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =25, Sex ="Female"), Huang_m_25_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =25, Sex ="Male"), Huang_f_28_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =28, Sex ="Female"), Huang_m_28_baseline_mortality_GWAS %>%mutate(Study ="Huang_2020", Temperature =28, Sex ="Male"), Wilson_f_baseline_mortality_1_GWAS %>%mutate(Study ="Wilson_2020_1", Temperature =25, Sex ="Female"), Wilson_f_baseline_mortality_2_GWAS %>%mutate(Study ="Wilson_2020_2", Temperature =25, Sex ="Female"), Durham_f_baseline_mortality_GWAS %>%mutate(Study ="Durham", Temperature =25, Sex ="Female"), Patel_f_baseline_mortality_GWAS %>%mutate(Study ="Patel", Temperature =23, Sex ="Female")) %>% dplyr::select(SNP, BETA, Study, Temperature, Sex) %>%pivot_wider(values_from = BETA, names_from =c(Study, Temperature, Sex))SNP_baseline_mortality_axis_LD_pruned <- SNP_baseline_mortality_axis_data %>%inner_join(Genomic_regions)SNP_baseline_mortality_axis_corr_matrix <-cor(SNP_baseline_mortality_axis_LD_pruned %>% dplyr::select(-SNP), use ="pairwise.complete.obs", method ="spearman")```### Calculate meta-analytic test statisticsThe purpose of these meta-analyses is to detect SNPs associated with 1) the rate of ageing and 2) baseline mortality rate.**Run CPASSOC for the rate of ageing**```{r}# rate of ageingageing_axis_Arya_f_T <- Arya_f_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_f = T)ageing_axis_Arya_m_T <- Arya_m_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_m = T)ageing_axis_Huang_f_18_T <- Huang_f_18_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_18 = T)ageing_axis_Huang_m_18_T <- Huang_m_18_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_18 = T)ageing_axis_Huang_f_25_T <- Huang_f_25_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_25 = T)ageing_axis_Huang_m_25_T <- Huang_m_25_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_25 = T)ageing_axis_Huang_f_28_T <- Huang_f_28_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_28 = T)ageing_axis_Huang_m_28_T <- Huang_m_28_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_28 = T)ageing_axis_Wilson_f_1_T <- Wilson_f_ageing_1_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_1 = T)ageing_axis_Wilson_f_2_T <- Wilson_f_ageing_2_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_2 = T)ageing_axis_Durham_f_T <- Durham_f_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Durham_f = T)ageing_axis_Patel_f_T <- Patel_f_ageing_GWAS %>% dplyr::select(SNP, T) %>%rename(Patel_f = T)ageing_axis_t_stats <- ageing_axis_Arya_f_T %>%inner_join(ageing_axis_Arya_m_T, by ="SNP") %>%inner_join(ageing_axis_Huang_f_18_T, by ="SNP") %>%inner_join(ageing_axis_Huang_m_18_T, by ="SNP") %>%inner_join(ageing_axis_Huang_f_25_T, by ="SNP") %>%inner_join(ageing_axis_Huang_m_25_T, by ="SNP") %>%inner_join(ageing_axis_Huang_f_28_T, by ="SNP") %>%inner_join(ageing_axis_Huang_m_28_T, by ="SNP") %>%inner_join(ageing_axis_Wilson_f_1_T, by ="SNP") %>%inner_join(ageing_axis_Wilson_f_2_T, by ="SNP") %>%inner_join(ageing_axis_Durham_f_T, by ="SNP") %>%inner_join(ageing_axis_Patel_f_T, by ="SNP") ageing_axis_t_stat_values <- ageing_axis_t_stats %>% dplyr::select(2:13)Sample_size_ageing_axis <-c(165, 165, 183, 183, 186, 186, 177, 177, 161, 161, 176, 193)if(!file.exists("data/Derived/GWAS_results/ageing_axis_meta_results.csv")) {# run the homogeneous effect meta-analysisS_hom <-SHom(ageing_axis_t_stat_values, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix)# calculate meta-p-values and bind the two together with the SNP namesp_S_hom <-pchisq(S_hom, df =1, ncp =0, lower.tail = F) %>%as_tibble() %>%bind_cols(S_hom) %>%rename(meta_p_hom = value, S_hom = ...2)# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary for different traits e.g. if a SNP has a sex or environment opposite effect on lifespan# estimate parameters of gamma distributionpara <-EstimateGamma(N =1E4, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix);S_het <-SHet(ageing_axis_t_stat_values, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix)# obtain P-values of S_Het using the estimated gamma parametersp_S_het <-pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>%as_tibble() %>%bind_cols(S_het) %>%rename(meta_p_het = value, S_het = ...2)ageing_axis_meta_results <- ageing_axis_t_stats %>%bind_cols(p_S_hom, p_S_het) # add the unadjusted p valueswrite_csv(ageing_axis_meta_results, "data/Derived/GWAS_results/ageing_axis_meta_results.csv")} else ageing_axis_meta_results <-read_csv("data/Derived/GWAS_results/ageing_axis_meta_results.csv")```**Run CPASSOC for the baseline rate of mortality**```{r}baseline_mortality_axis_Arya_f_T <- Arya_f_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_f = T)baseline_mortality_axis_Arya_m_T <- Arya_m_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Arya_m = T)baseline_mortality_axis_Huang_f_18_T <- Huang_f_18_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_18 = T)baseline_mortality_axis_Huang_m_18_T <- Huang_m_18_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_18 = T)baseline_mortality_axis_Huang_f_25_T <- Huang_f_25_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_25 = T)baseline_mortality_axis_Huang_m_25_T <- Huang_m_25_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_25 = T)baseline_mortality_axis_Huang_f_28_T <- Huang_f_28_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_f_28 = T)baseline_mortality_axis_Huang_m_28_T <- Huang_m_28_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Huang_m_28 = T)baseline_mortality_axis_Wilson_f_1_T <- Wilson_f_baseline_mortality_1_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_1 = T)baseline_mortality_axis_Wilson_f_2_T <- Wilson_f_baseline_mortality_2_GWAS %>% dplyr::select(SNP, T) %>%rename(Wilson_f_2 = T)baseline_mortality_axis_Durham_f_T <- Durham_f_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Durham_f = T)baseline_mortality_axis_Patel_f_T <- Patel_f_baseline_mortality_GWAS %>% dplyr::select(SNP, T) %>%rename(Patel_f = T)baseline_mortality_axis_t_stats <- baseline_mortality_axis_Arya_f_T %>%inner_join(baseline_mortality_axis_Arya_m_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_f_18_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_m_18_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_f_25_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_m_25_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_f_28_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Huang_m_28_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Wilson_f_1_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Wilson_f_2_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Durham_f_T, by ="SNP") %>%inner_join(baseline_mortality_axis_Patel_f_T, by ="SNP") baseline_mortality_axis_t_stat_values <- baseline_mortality_axis_t_stats %>% dplyr::select(2:13)Sample_size_baseline_mortality_axis <-c(165, 165, 183, 183, 186, 186, 177, 177, 161, 161, 176, 193)if(!file.exists("data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")) {# run the homogeneous effect meta-analysisS_hom <-SHom(baseline_mortality_axis_t_stat_values, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix)# calculate meta-p-values and bind the two together with the SNP namesp_S_hom <-pchisq(S_hom, df =1, ncp =0, lower.tail = F) %>%as_tibble() %>%bind_cols(S_hom) %>%rename(meta_p_hom = value, S_hom = ...2)# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary for different traits (e.g. if a SNP has a sex or enviornment opposite effect on lifespan)# estimate parameters of gamma distributionpara <-EstimateGamma(N =1E4, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix);S_het <-SHet(baseline_mortality_axis_t_stat_values, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix)# obtain P-values of S_Het using the estimated gamma parametersp_S_het <-pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>%as_tibble() %>%bind_cols(S_het) %>%rename(meta_p_het = value, S_het = ...2)baseline_mortality_axis_meta_results <- baseline_mortality_axis_t_stats %>%bind_cols(p_S_hom, p_S_het) # add the unadjusted p valueswrite_csv(baseline_mortality_axis_meta_results, "data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")} else baseline_mortality_axis_meta_results <-read_csv("data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")```## Visualise the resultsWe combine GWAS $T$ statistics calculated for the rate of ageing and baseline mortality measured across different contexts. It's possible that some SNPs have G x E interactions that lead to a heterogeneous effect across phenotypes. We therefore utilise the `S_het` calculated p-values.First lets show the effect of `CPASSOC` on the number of variants found to be associated with the rate of ageing and the scaling of mortality risk.**Table SX**. the number of variants associated with ageing rate and baseline mortality at various significance thresholds, estimated by univariate GWAS or CPASSOC. The number of genomic regions indicates the number of variants detected after LD pruning.```{r}tibble(Analysis =c("CPASSOC", "Univariate", "CPASSOC", "Univariate"),Trait =c("Ageing rate", "Ageing rate", "Scaling", "Scaling"),`p < 1e-05 variants`=c(sum(ageing_axis_meta_results$meta_p_het <1e-05),nrow(ageing_p_05_SNPs),sum(baseline_mortality_axis_meta_results$meta_p_het <1e-05),nrow(scaling_p_05_SNPs)),`p < 1e-05 genomic regions`=c(nrow(ageing_axis_meta_results %>%filter(meta_p_het <1e-05) %>%inner_join(Genomic_regions)),nrow(ageing_p_05_SNPs %>%filter(Pruned_variant =="YES")),nrow(baseline_mortality_axis_meta_results %>%filter(meta_p_het <1e-05) %>%inner_join(Genomic_regions)),nrow(scaling_p_05_SNPs %>%filter(Pruned_variant =="YES"))),`p < 1e-08 variants`=c(sum(ageing_axis_meta_results$meta_p_het <1e-08),sum(ageing_table$`p < 1e-08 variants`),sum(baseline_mortality_axis_meta_results$meta_p_het <1e-08),sum(scaling_table$`p < 1e-08 variants`)),`p < 1e-08 genomic regions`=c(nrow(ageing_axis_meta_results %>%filter(meta_p_het <1e-08) %>%inner_join(Genomic_regions)),sum(ageing_table$`p < 1e-08 genomic regions`),nrow(baseline_mortality_axis_meta_results %>%filter(meta_p_het <1e-08) %>%inner_join(Genomic_regions)),sum(scaling_table$`p < 1e-08 genomic regions`))) %>%kable() %>%kable_styling()```**Table SX**. genes that encompass or are very close to the genetic variants that have associations with the rate of ageing.```{r}# join gene annotations with the list of analysed variants # note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.ageing_rate_genes <- ageing_axis_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP, S_het, meta_p_het) %>%left_join(annotations %>%filter(distance.to.gene <=500)) %>%mutate(meta_p_het =signif(meta_p_het*10^9, 3)/10^9,S_het =round(S_het, 3)) %>% dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)ageing_rate_genes %>%my_data_table()```**Table SX**. genes that encompass or are very close to the genetic variants that have associations with baseline mortality rate.```{r}scaling_genes <- baseline_mortality_axis_meta_results %>%filter(meta_p_het <1e-08) %>% dplyr::select(SNP, S_het, meta_p_het) %>%left_join(annotations %>%filter(distance.to.gene <=500)) %>%mutate(meta_p_het =signif(meta_p_het*10^10, 3)/10^10,S_het =round(S_het, 3)) %>% dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)scaling_genes %>%my_data_table()```Now lets build some 'Manhattan plots' to show where these significant associations can be found:```{r, fig.width=11, eval=TRUE}#| column: pageageing_axis_results <- ageing_axis_meta_results %>% inner_join(Genomic_regions) %>% dplyr::select(SNP, meta_p_hom, meta_p_het) %>% rename(P = meta_p_het) %>% mutate(logp = -log10(P))baseline_mortality_axis_results <- baseline_mortality_axis_meta_results %>% inner_join(Genomic_regions) %>% dplyr::select(SNP, meta_p_hom, meta_p_het) %>% rename(P = meta_p_het) %>% mutate(logp = -log10(P))# plot the results using the manhattan plot custom function we defined earlierageing_axis_het_plot <- build_manhattan_plot(ageing_axis_results) + labs(title = "Ageing rate") + theme(plot.title = element_text(size = 20, hjust = 0.5)) + scale_y_continuous(limits = c(0, 19), expand = c(0, 0))baseline_mortality_axis_het_plot <- build_manhattan_plot(baseline_mortality_axis_results) + labs(title = "Baseline mortality") + theme(plot.title = element_text(size = 20, hjust = 0.5)) + scale_y_continuous(limits = c(0, 19), expand = c(0, 0))baseline_mortality_axis_het_plot + ageing_axis_het_plot ```**Figure XX**. Manhattan plots showing the -Log~10~ *p*-value for each locus' effect on baseline mortality and the rate of ageing.Plot the univariate effect sizes for each of the genomic regions associated with the rate of ageing at the genome-wide significance threshold (p \< $10^{-8}$) after CPASSOC.```{r, fig.height=9}SNP_heatmap_ageing_axis <- SNP_ageing_axis_data %>% inner_join( ageing_axis_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% inner_join(Genomic_regions))row_name <- SNP_heatmap_ageing_axis$SNPSNP_heatmap_ageing_axis <- SNP_heatmap_ageing_axis %>% dplyr::select(-SNP) %>% as.matrix()rownames(SNP_heatmap_ageing_axis) <- row_namebreaksList <- seq(-0.1, 0.1, by = 0.001)annotation_SNPs <- ageing_axis_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L", str_detect(SNP, "2R") ~ "2R", str_detect(SNP, "3L") ~ "3L", str_detect(SNP, "3R") ~ "3R", str_detect(SNP, "X") ~ "X"))annotation_studies <- tibble(Study = c("Arya_2010_f_25", "Huang_2020_f_18", "Huang_2020_f_25", "Huang_2020_f_28", "Wilson_2020_f_25_1", "Wilson_2020_f_25_2", "Durham_2014_f_25", "Patel_2021_f_23", "Arya_2010_m_25", "Huang_2020_m_18", "Huang_2020_m_25", "Huang_2020_m_28"), Temperature = c("25", "18", "25", "28", "25", "25", "25", "23", "25", "18", "25", "28")) %>% mutate(Sex = case_when(str_detect(Study, "_f") ~ "Female", .default = "Male"), Mating = case_when(str_detect(Study, "Arya") ~ "NO", str_detect(Study, "Huang") ~ "Throughout life", str_detect(Study, "Wilson") ~ "Early life", str_detect(Study, "Durham") ~ "Throughout life", str_detect(Study, "Patel") ~ "Early life"), Author = str_extract(Study, ".*(?=\\_)"), Author = str_remove(Author, "_f"), Author = str_remove(Author, "_m"))# create a study annotation column, need this to be a data.frame rather than a tibble for some reason Study_details <- annotation_studies %>% as.data.frame() %>% dplyr::select(Study, Temperature, Mating)my_categories <- data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2], Mating = Study_details[, 3])my_colors <- list(Temperature = c("18" = "#7bbcd5", # sailboat colours from pnw "23" = "#d0e2af", "25" = "#f5db99", "28" = "#e89c81"), Mating = c("NO" = "#f8e3d1", # Shuksan from pnw "Early life" = "#d7b1c5", "Throughout life" = "#ac8eab"), Chromosome = c("2L" = "#d8aedd", # lake colours from pnw "2R" = "#cb74ad", "3L" = "#11c2b5", "3R" = "#72e1e1", "X" = "#fbcc74"))# create a SNP annotation columnSNP_details <- annotation_SNPs %>% as.data.frame()my_SNP_categories <- data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])my_col_names <- c("Arya et al females", "Huang et al females", "Huang et al females", "Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females", "Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males", "Huang et al males") pheatmap(SNP_heatmap_ageing_axis, breaks = breaksList, main = "", color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)), legend = TRUE, cutree_rows = 6, cutree_cols = 5, angle_col = 45, border_color = "white", annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories, fontsize = 8, labels_col = my_col_names)```**Figure XX**. univariate effect sizes for each of the genomic regions associated with ageing rate at the genome-wide significance threshold (p \< $10^{-8}$) after CPASSOC. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis. Positive effect sizes indicate that the minor allele increases ageing rate in the conditions the study was performed in.Plot the univariate effect sizes for each of the genomic regions associated with the scaling of mortality risk at the genome-wide significance threshold (p \< $0.05^{-8}$) after CPASSOC.```{r, fig.height=9}SNP_heatmap_baseline_mortality_axis <- SNP_baseline_mortality_axis_data %>% inner_join( baseline_mortality_axis_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% inner_join(Genomic_regions))row_name <- SNP_heatmap_baseline_mortality_axis$SNPSNP_heatmap_baseline_mortality_axis <- SNP_heatmap_baseline_mortality_axis %>% dplyr::select(-SNP) %>% as.matrix()rownames(SNP_heatmap_baseline_mortality_axis) <- row_namebreaksList <- seq(-7, 7, by = 0.01)annotation_SNPs <- baseline_mortality_axis_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L", str_detect(SNP, "2R") ~ "2R", str_detect(SNP, "3L") ~ "3L", str_detect(SNP, "3R") ~ "3R", str_detect(SNP, "X") ~ "X"))annotation_studies <- tibble(Study = c("Arya_2010_f_25", "Huang_2020_f_18", "Huang_2020_f_25", "Huang_2020_f_28", "Wilson_2020_f_25_1", "Wilson_2020_f_25_2", "Durham_2014_f_25", "Patel_2021_f_23", "Arya_2010_m_25", "Huang_2020_m_18", "Huang_2020_m_25", "Huang_2020_m_28"), Temperature = c("25", "18", "25", "28", "25", "25", "25", "23", "25", "18", "25", "28")) %>% mutate(Sex = case_when(str_detect(Study, "_f") ~ "Female", .default = "Male"), Mating = case_when(str_detect(Study, "Arya") ~ "NO", str_detect(Study, "Huang") ~ "Throughout life", str_detect(Study, "Wilson") ~ "Early life", str_detect(Study, "Durham") ~ "Throughout life", str_detect(Study, "Patel") ~ "Early life"), Author = str_extract(Study, ".*(?=\\_)"), Author = str_remove(Author, "_f"), Author = str_remove(Author, "_m"))# create a study annotation column, need this to be a data.frame rather than a tibble for some reason Study_details <- annotation_studies %>% as.data.frame() %>% dplyr::select(Study, Temperature, Mating)my_categories <- data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2], Mating = Study_details[, 3])my_colors <- list(Temperature = c("18" = "#7bbcd5", # sailboat colours from pnw "23" = "#d0e2af", "25" = "#f5db99", "28" = "#e89c81"), Mating = c("NO" = "#f8e3d1", # Shuksan from pnw "Early life" = "#d7b1c5", "Throughout life" = "#ac8eab"), Chromosome = c("2L" = "#d8aedd", # lake colours from pnw "2R" = "#cb74ad", "3L" = "#11c2b5", "3R" = "#72e1e1", "X" = "#fbcc74"))# create a SNP annotation columnSNP_details <- annotation_SNPs %>% as.data.frame()my_SNP_categories <- data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])my_col_names <- c("Arya et al females", "Huang et al females", "Huang et al females", "Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females", "Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males", "Huang et al males") pheatmap(SNP_heatmap_baseline_mortality_axis, breaks = breaksList, main = "", color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)), legend = TRUE, cutree_rows = 6, cutree_cols = 5, angle_col = 45, border_color = "white", annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories, fontsize = 8, labels_col = my_col_names)```**Figure XX**. univariate effect sizes for each of the SNPs associated with mortality scaling at the genome-wide significance threshold (p \< $0.05^{-8}$) after CPASSOC. Effect sizes are expressed in standard deviations from the mean life expectancy found in each study. Studies are clustered by similiarity in SNP effects on the X axis, while SNPs are clustered by similarity in effect size across studies on the Y axis. Positive effect sizes indicate that the minor allele increases life expectancy in the conditions the study was performed in.# Are ageing and baseline mortality polygenic?If traits are polygenic, the majority of the genetic variants that effect their expession will have effects that are too small to detect with GWA, unless sample sizes are truly gigantic. A promising alternative is to instead look to see if effects estimated in one study can be replicated in a second, independent study. To test this in our dataset, we selected one trait measurement from each study trait that phenotyped females, at 25C, with an opportunity for mating.As a control, this is what happens if we bin and plot the relationship between two uncorrelated variables```{r}sim_data <-tibble(draw_1 =rnorm(220437, 0, 1),draw_2 =rnorm(220437, 0, 1)) %>%arrange(draw_1) %>%mutate(bin =c(rep(1:floor(n()/100), each =100),rep(floor(n()/100) +1, each =n() %%100))) %>%group_by(bin) %>%summarise(draw_1 =mean(draw_1), draw_2 =mean(draw_2))(boyle_plot_sim <- sim_data %>%ggplot(aes(draw_1, draw_2)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-4, 4), ylim =c(-4, 4)) +xlab("Mean effect size \n (random draw 1)") +ylab("Mean effect size \n (random draw 2)") +theme_bw() +theme(strip.background =element_blank(),strip.text =element_text(hjust=0)) +theme(text =element_text(size =14)))``````{r}ageing_boyle_data <- SNP_ageing_axis_LD_pruned %>% dplyr::select(SNP, Huang_2020_25_Female, Wilson_2020_1_25_Female, Durham_2014_25_Female) %>%filter_at(vars(2:4), all_vars(!is.na(.))) %>%# remove NAsarrange(Huang_2020_25_Female) %>%mutate(bin =c(rep(1:floor(n()/100), each =100),rep(floor(n()/100) +1, each =n() %%100))) %>%group_by(bin) %>%summarise(Huang_2020_25_Female =mean(Huang_2020_25_Female), Wilson_2020_1_25_Female =mean(Wilson_2020_1_25_Female),Durham_2014_25_Female =mean(Durham_2014_25_Female))ageing_boyle_data_2 <- SNP_ageing_axis_LD_pruned %>% dplyr::select(SNP, Wilson_2020_1_25_Female, Durham_2014_25_Female) %>%filter_at(vars(2:3), all_vars(!is.na(.))) %>%# remove NAsarrange(Wilson_2020_1_25_Female) %>%mutate(bin =c(rep(1:floor(n()/100), each =100),rep(floor(n()/100) +1, each =n() %%100))) %>%group_by(bin) %>%summarise(Wilson_2020_1_25_Female =mean(Wilson_2020_1_25_Female),Durham_2014_25_Female =mean(Durham_2014_25_Female))baseline_mortality_boyle_data <- SNP_baseline_mortality_axis_LD_pruned %>% dplyr::select(SNP, Huang_2020_25_Female, Wilson_2020_1_25_Female, Durham_25_Female) %>%filter_at(vars(2:4), all_vars(!is.na(.))) %>%# remove NAsarrange(Huang_2020_25_Female) %>%mutate(bin =c(rep(1:floor(n()/100), each =100),rep(floor(n()/100) +1, each =n() %%100))) %>%group_by(bin) %>%summarise(Huang_2020_25_Female =mean(Huang_2020_25_Female), Wilson_2020_1_25_Female =mean(Wilson_2020_1_25_Female),Durham_25_Female =mean(Durham_25_Female))baseline_mortality_boyle_data_2 <- SNP_baseline_mortality_axis_LD_pruned %>% dplyr::select(SNP, Wilson_2020_1_25_Female, Durham_25_Female) %>%filter_at(vars(2:3), all_vars(!is.na(.))) %>%# remove NAsarrange(Wilson_2020_1_25_Female) %>%mutate(bin =c(rep(1:floor(n()/100), each =100),rep(floor(n()/100) +1, each =n() %%100))) %>%group_by(bin) %>%summarise(Wilson_2020_1_25_Female =mean(Wilson_2020_1_25_Female),Durham_25_Female =mean(Durham_25_Female))boyle_plot_H_W <- ageing_boyle_data %>%ggplot(aes(Huang_2020_25_Female, Wilson_2020_1_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-0.16, 0.16), ylim =c(-0.1, 0.1)) +xlab("Ageing SNP effect (Huang et al.)") +ylab("Ageing SNP effect (Wilson et al.)") +theme_bw() +theme(strip.background =element_blank(),strip.text =element_text(hjust=0)) +theme(text =element_text(size =10))boyle_plot_H_D <- ageing_boyle_data %>%ggplot(aes(Huang_2020_25_Female, Durham_2014_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-0.16, 0.16), ylim =c(-0.1, 0.1)) +labs(x ="Ageing SNP effect (Huang et al.)",y ="Ageing SNP effect (Durham et al.)") +theme_bw() +theme(plot.title =element_text(hjust =0.5),text =element_text(size =10))boyle_plot_W_D <- ageing_boyle_data_2 %>%ggplot(aes(Wilson_2020_1_25_Female, Durham_2014_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-0.16, 0.16), ylim =c(-0.1, 0.1)) +xlab("Ageing SNP effect (Wilson et al.)") +ylab("Ageing SNP effect (Durham et al.)") +theme_bw() +theme(strip.background =element_blank(),strip.text =element_text(hjust=0)) +theme(text =element_text(size =10))boyle_baseline_plot_H_W <- baseline_mortality_boyle_data %>%ggplot(aes(Huang_2020_25_Female, Wilson_2020_1_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-5, 5), ylim =c(-3.125, 3.125)) +labs(x ="Scaling SNP effect (Huang et al.)",y ="Scaling SNP effect (Wilson et al.)") +theme_bw() +theme(strip.background =element_blank(),strip.text =element_text(hjust=0)) +theme(text =element_text(size =10))boyle_baseline_plot_H_D <- baseline_mortality_boyle_data %>%ggplot(aes(Huang_2020_25_Female, Durham_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-5, 5), ylim =c(-3.125, 3.125)) +labs(x ="Scaling SNP effect (Huang et al.)",y ="Scaling SNP effect (Durham et al.)") +theme_bw() +theme(plot.title =element_text(hjust =0.5),text =element_text(size =10))boyle_baseline_plot_W_D <- baseline_mortality_boyle_data_2 %>%ggplot(aes(Wilson_2020_1_25_Female, Durham_25_Female)) +geom_hline(yintercept =0, linetype =2) +geom_vline(xintercept =0, linetype =2) +geom_point(alpha =0.8, size =2.2) +stat_smooth(method ="lm", formula = y ~ x +I(x^2), linewidth =0.75) +coord_cartesian(xlim =c(-5, 5), ylim =c(-3.125, 3.125)) +labs(x ="Scaling SNP effect (Wilson et al.)",y ="Scaling SNP effect (Durham et al.)") +theme_bw() +theme(strip.background =element_blank(),strip.text =element_text(hjust=0)) +theme(text =element_text(size =10))(boyle_plot_H_W + boyle_plot_H_D + boyle_plot_W_D) / (boyle_baseline_plot_H_W + boyle_baseline_plot_H_D + boyle_baseline_plot_W_D) ```**Figure SX**. Each point represents the mean effect size for a group of 100 genomic regions, ordered by association with female ageing rate (top panels) or female baseline mortality risk (bottom panels), measured in the study named on the x-axis. While traits were measured different laboratories, conditions were similar in each treatment: females were housed at 25C, with an opportunity for mating. Effect sizes are expressed as trait standard deviations.# Figure 4```{r, fig.width=8.3, fig.height=6.97, eval=TRUE}f4_a <- c + labs(title = NULL) + theme(legend.position="none")f4_b <- e + labs(title = NULL) + theme(legend.position = "none")f4_c <- g + labs(title = NULL) + theme(legend.position = "none")part_1 <- (f4_a + f4_b + f4_c) + plot_layout(#guides = collect, axis_titles = "collect")f4_e <- boyle_plot_H_W + labs(x = "SNP effect (Huang et al.)", y = "SNP effect (Wilson et al.)") f4_f <- boyle_plot_H_D + labs(x = "SNP effect (Huang et al.)", y = "SNP effect (Durham et al.)") f4_g <- boyle_plot_W_D + labs(x = "SNP effect (Wilson et al.)", y = "SNP effect (Durham et al.)") part_3 <- (f4_e + f4_f + f4_g)part_1 / (ageing_axis_het_plot + labs(title = NULL)) / part_3 + plot_annotation(tag_levels = "A")```**Figure 4.** detection of genetic variants associated with the rate of ageing. **A**-**C** demonstrate our ageing rate metric used for genome-wide association analysis. Dashed lines show simulations from the gompertz distribution: each line was generated with a different rate of ageing value and extends as the baseline mortality rate changes. Note that the slope from the regressions of lifespan equality on life expectancy align closely with these curves. Points show fly genotypes; deviations from the regression line therefore indicate that genotypes differ in the rate of ageing.